facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.06k stars 885 forks source link

Cannot reproduce KNN performance for vanilla ViT-S training #280

Open wangh09 opened 1 week ago

wangh09 commented 1 week ago

Hi! I followed the instructions in the readme.md to train a vanilla ViT-S model, and yield 68.4% KNN top1 acc which is 0.9% less than the claimed one. Is it because pytorch version matters or I should export the teacher backbone for evaluation? Thanks!

wangh09 commented 3 hours ago

I re-run vanilla training using PyTorch 1.7 (68.53%), PyTorch 1.13 (68.43%) PyTorch 2.3 (68.40%) without changing a line of code. Although the results show some correlation with PyTorch&CUDA versions but the difference should less than 0.2%. Is there any chance that there's a typo in the doc or the code for the 69.3% training is different with current main branch? Thanks!