facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.22k stars 904 forks source link

vit-small train with my dataset(custom dataset + imagenet), Top1 eval in knn on imagenet is lower #190

Open guangdongliang opened 2 years ago

guangdongliang commented 2 years ago

the top1-acc is normal when train 4 imagenet on 64 v100 with same hyperparameters as paper, but the it become much lower(about 20% difference) when train with custom dataset(12800003 images) + imagenet(1280000 images). it is very strage because the number of train dataset is same. when I change weight decay to smaller value, the top1-acc will increase a little. Is there underfitting in my training. I think dino is sensitive to dataset.

woctezuma commented 2 years ago

What is strange about that? You don't mention what the custom dataset is. Nor what the evaluation dataset is.

Let us imagine the content is much different. For instance:

In this case, you should expect better evaluation results by training with OD, rather than by training with 1/4 OD and 3/4 CD.

guangdongliang commented 2 years ago

What is strange about that? You don't mention what the custom dataset is. Nor what the evaluation dataset is.

Let us imagine the content is much different. For instance:

  • original dataset (OD): human faces,
  • custom dataset (CD): cats and dogs,
  • evaluation dataset: human faces.

In this case, you should expect better evaluation results by training with OD, rather than by training with 1/4 OD and 3/4 CD.

thanks for your replay !

I train with OD(imagenet) + CD(human and nature images), because I want to confirm if the result model is fine by comparing my top1-knn with value in paper.

In my experience, the value should be higher, because the model has seen more data which make it robust.

woctezuma commented 2 years ago

I see. However, you are only checking the model against one evaluation task (linear classification on ImageNet). Maybe your model is more robust, but less good for tasks targeting ImageNet only.

If possible, try to use a different evaluation dataset which would be more similar to your custom dataset. This way, you should see that you get better results with 1/4*OD+3/4*CD rather than with OD alone.

Or maybe another kind of evaluation, which uses a dataset different from ImageNet. For instance: https://github.com/facebookresearch/dino#evaluation-image-retrieval-on-revisited-oxford-and-paris