Loss is not decresing - Githubissues

risingClouds commented 9 months ago

When I try to trian resnet50 by dino on on X-Ray dataset with 1000 images, the loss is not drop, even increase some time. Have anyone met the same issue and solve it. the config as follow:(at first, I make all the hyper parmeter default, but the loss is not decrease)

arch: resnet50
batch_size_per_gpu: 128
clip_grad: 3.0
data_path: /home_data/home/v-luotao/projects/yolov9/datasets/images/train/
dist_url: env://
drop_path_rate: 0.1
epochs: 201
freeze_last_layer: 20
global_crops_scale: (0.4, 1.0)
gpu: 0
local_crops_number: 8
local_crops_scale: (0.05, 0.4)
local_rank: 0
lr: 0.05
min_lr: 0.0005
momentum_teacher: 0.995
norm_last_layer: True
num_workers: 10
optimizer: adamw
out_dim: 2048
output_dir: ./saving_dir
patch_size: 16
rank: 0
saveckp_freq: 20
seed: 0
teacher_temp: 0.04
use_bn_in_head: True
use_fp16: False
warmup_epochs: 10
warmup_teacher_temp: 2e-05
warmup_teacher_temp_epochs: 0
weight_decay: 0.04
weight_decay_end: 0.4
world_size: 1

risingClouds commented 9 months ago

the log as follow: %5YTR8%G}X53{L@U970C2%P

swarajnanda2021 commented 7 months ago

I have the exact same issue. I don't seem to understand why loss does not drop.

sunjeet95 commented 5 months ago

How many epochs did you train it for?

swarajnanda2021 commented 5 months ago

The problem was solved. This happened because I was not using copy.deepcopy to define the student and teacher in the beginning.

MarsZhaoYT commented 1 month ago

Maybe you can use the SGD optimizer. I'm trying to follow this training command from the author on a single GPU. python main_dino.py --arch resnet50 --optimizer sgd --lr 0.03 --weight_decay 1e-4 --weight_decay_end 1e-4 --global_crops_scale 0.14 1 --local_crops_scale 0.05 0.14 --data_path /path/to/dataset/train --output_dir /path/to/saving_dir

facebookresearch / dino

Loss is not decresing #273