facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.06k stars 885 forks source link

Choice of out_dim #279

Closed JunboShen closed 3 hours ago

JunboShen commented 1 week ago

The default out_dim is 65536. I trained using 65536 on a custom dataset but the loss did not decrease, or decrease very slowly. If I chose a smaller out_dim, the loss seemed to converge much faster. It is okay/recommended to use a smaller out_dim, let's say change from 65535 to 1024, 2048, etc.

wangh09 commented 3 hours ago

Of course. iBOT, another work based on DINO and inspired DINOv2, chose dim=8192 (though with an extra MIM head).