FreddeFrallan / Multilingual-CLIP

OpenAI CLIP text encoders for multiple languages!
MIT License
745 stars 69 forks source link

Data leak #24

Open kimihailv opened 1 year ago

kimihailv commented 1 year ago

Hello! According to XTD-10 repo, the test set contains 800 images from MSCOCO train set. During training you also use MSCOCO train set – it seems you have data leak. Or may be I don't understand something.

FreddeFrallan commented 1 year ago

Hey,

Now that you mention it, it looks like XTD includes train images in their translated captions. Which, in my humble opinion, is a rather weird decision... At least when there's still data from val+test that they have not used... ? So yes, there seems to be data leakage in our evaluation.

We're currently working on creating a better evaluation system at CLIP_BENCHMARK, and we are working towards creating some multilingual evaluations.

The evaluations at this repo should be updated when such evaluations are available.

guillemram97 commented 1 year ago

How did you evaluate Table 1 in the original paper ('Cross-lingual and Multilingual CLIP')? The space of retrievable images were the 1k images from XTD-10 dataset? Because there's null interesection between the images of that dataset and the MSCOCO 2014 test set.