Open ShoufaChen opened 2 years ago
It seems that the pretraining dataset of CLIP has not been released yet, so I'm afraid an detailed analysis of the data overlapping issues would be impossible right now for anyone outside OpenAI.
We do find some analysis on Kinetics-700 in the original CLIP paper, Section 5: The overlap between the pretraining data and K700 validation set is around 1% (Fig. 17) but many of them are 'black transition frames', and that makes the results on the overlapped set actually much lower than the full validation set.
Also, the pretraining (image, text) pairs and the downstream (video, label) pairs contain different forms of both visual contents and annotations, which we think is another factor that may lower the impact of overlapped set.
While it's unfortunate that the analysis we can do is limited, we hope the explanation above can address part of your concerns.
Hi,
Congratulations on your awesome work. It is interesting and meaningful.
I have a little concern about this work. I was wondering if it is possible that the dataset on which the CLIP is pre-trained contains part of Kinetics validation video/images.
Thanks in advance.