facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.25k stars 905 forks source link

Faster training #247

Open rbareja25 opened 1 year ago

rbareja25 commented 1 year ago

hello,

I am using vit tiny as well as deit tiny with around 55k images on 2 gpus, with below parameters, the training is very slow, I could complete around 8 epochs in 24 hours. How can I make the training faster? I would also want to add more images as I have around 10 times bigger final dataset. I am starting with 55k images first but training with this only is very slow. Can someone suggest what parameters can help the training faster? I tried 4 GPU's but I think on our cluster 4 GPU configuration did not work out and the job got hung up after a couple of hours,so 2 GPU's is the max I can use.

batch_size_per_gpu: 32
clip_grad: 3.0
csv_path: ./data.csv
data_path: /path/to/imagenet/train/
dist_url: env://
drop_path_rate: 0.1
epochs: 50
freeze_last_layer: 1
global_crops_scale: (0.4, 1.0)
gpu: 0
img_size: 256
local_crops_number: 8
local_crops_scale: (0.05, 0.4)
local_rank: 0
lr: 0.0005
max_patches_total: 25
min_lr: 1e-06
momentum_teacher: 0.996
norm_last_layer: True
num_workers: 10
optimizer: adamw
out_dim: 65536
output_dir: out/
patch_data_path: _Patches256x256/
patch_size: 16
rank: 0
saveckp_freq: 2
seed: 0
teacher_temp: 0.04
use_bn_in_head: False
use_fp16: True
warmup_epochs: 10
warmup_teacher_temp: 0.04
warmup_teacher_temp_epochs: 0
weight_decay: 0.04
weight_decay_end: 0.4
world_size: 2

Thanks, Rohan

rbareja25 commented 1 year ago

Any suggestions from people who have run DINO or experimented with DINO will be helpful too..

winston52 commented 5 months ago

Hi, have you tried to find a way to increase the training efficiency?