facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.25k stars 905 forks source link

Loss is not decresing #273

Open risingClouds opened 6 months ago

risingClouds commented 6 months ago

When I try to trian resnet50 by dino on on X-Ray dataset with 1000 images, the loss is not drop, even increase some time. Have anyone met the same issue and solve it. the config as follow:(at first, I make all the hyper parmeter default, but the loss is not decrease)

arch: resnet50
batch_size_per_gpu: 128
clip_grad: 3.0
data_path: /home_data/home/v-luotao/projects/yolov9/datasets/images/train/
dist_url: env://
drop_path_rate: 0.1
epochs: 201
freeze_last_layer: 20
global_crops_scale: (0.4, 1.0)
gpu: 0
local_crops_number: 8
local_crops_scale: (0.05, 0.4)
local_rank: 0
lr: 0.05
min_lr: 0.0005
momentum_teacher: 0.995
norm_last_layer: True
num_workers: 10
optimizer: adamw
out_dim: 2048
output_dir: ./saving_dir
patch_size: 16
rank: 0
saveckp_freq: 20
seed: 0
teacher_temp: 0.04
use_bn_in_head: True
use_fp16: False
warmup_epochs: 10
warmup_teacher_temp: 2e-05
warmup_teacher_temp_epochs: 0
weight_decay: 0.04
weight_decay_end: 0.4
world_size: 1
risingClouds commented 6 months ago

the log as follow: %5YTR8%G}X53{L@U970C2%P

swarajnanda2021 commented 5 months ago

I have the exact same issue. I don't seem to understand why loss does not drop.

sunjeet95 commented 3 months ago

How many epochs did you train it for?

swarajnanda2021 commented 3 months ago

The problem was solved. This happened because I was not using copy.deepcopy to define the student and teacher in the beginning.