OpenGVLab / efficient-video-recognition

169 stars 15 forks source link

Is it possible that the Frozen CLIP model saw the validation set of Kinetics #3

Open ShoufaChen opened 2 years ago

ShoufaChen commented 2 years ago

Hi,

Congratulations on your awesome work. It is interesting and meaningful.

I have a little concern about this work. I was wondering if it is possible that the dataset on which the CLIP is pre-trained contains part of Kinetics validation video/images.

Thanks in advance.

linziyi96 commented 2 years ago

It seems that the pretraining dataset of CLIP has not been released yet, so I'm afraid an detailed analysis of the data overlapping issues would be impossible right now for anyone outside OpenAI.

We do find some analysis on Kinetics-700 in the original CLIP paper, Section 5: The overlap between the pretraining data and K700 validation set is around 1% (Fig. 17) but many of them are 'black transition frames', and that makes the results on the overlapped set actually much lower than the full validation set.

Also, the pretraining (image, text) pairs and the downstream (video, label) pairs contain different forms of both visual contents and annotations, which we think is another factor that may lower the impact of overlapped set.

While it's unfortunate that the analysis we can do is limited, we hope the explanation above can address part of your concerns.