dqshuai / MetaFormer

A PyTorch implementation of "MetaFormer: A Unified Meta Framework for Fine-Grained Recognition". A reference PyTorch implementation of “CoAtNet: Marrying Convolution and Attention for All Data Sizes”
MIT License
216 stars 36 forks source link

Is it fair to use larger pretrained model? #3

Closed wztdream closed 2 years ago

wztdream commented 2 years ago

Hi, First of all congratulations for your great work!

I always worried about the effect of pretrainning for FGVC. There is high risk of data overlapping of pretrained dataset and fine tune dataset. Take CUB dataset for example, it already find that CUB200-2011 have overlapping images in test dataset with imagenet1k train dataset see here. So it is highly possible that there will be more overlap of CUB with imagenet21k and iNaturalist. So there seems twio possible sources that can explain the obviously improvement when using pretrained model with larger dataset:

  1. the pretrained model may learned some commonly useful structures which improves performance on CUB task, this is good
  2. the pretrained model with larger dataset just has seem more test image from CUB test dataset, so it performs well, this is bad

So what is your opinion about this risk?

dqshuai commented 2 years ago

Hi,thanks for your concern. Regarding the question of whether there is more overlap between cub and imagenet-21k or inaturelist, further image similarity analysis is required. But really, what we mainly want to provide is:

  1. To provide a simple way to improve the performance of FGVC with various meta information, we think it is interesting and hope that more people will try it together.
  2. Providing some powerful pre-trained models enables FGVC to be studied on a stronger baseline and is convenient for practical applications. In fact, you can also use our results as a reference result for your future research. In fact I think most models trained on large scale data run the risk of overlapping with the test dataset for downstream tasks. :)