facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.05k stars 885 forks source link

For large batches (256), there is a problem of loss non convergence #265

Open Elijah-Yi opened 7 months ago

Elijah-Yi commented 7 months ago

For large batches (256), there is a problem of loss non convergence