Is it possible that the Frozen CLIP model saw the validation set of Kinetics

It seems that the pretraining dataset of CLIP has not been released yet, so I'm afraid an detailed analysis of the data overlapping issues would be impossible right now for anyone outside OpenAI.

We do find some analysis on Kinetics-700 in the original CLIP paper, Section 5: The overlap between the pretraining data and K700 validation set is around 1% (Fig. 17) but many of them are 'black transition frames', and that makes the results on the overlapped set actually much lower than the full validation set.

Also, the pretraining (image, text) pairs and the downstream (video, label) pairs contain different forms of both visual contents and annotations, which we think is another factor that may lower the impact of overlapped set.

While it's unfortunate that the analysis we can do is limited, we hope the explanation above can address part of your concerns.

OpenGVLab / efficient-video-recognition

Is it possible that the Frozen CLIP model saw the validation set of Kinetics #3