facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Other
1.17k stars 49 forks source link

Release of the cleaned 400M cleaned image-text pairs #19

Open devaansh100 opened 10 months ago

devaansh100 commented 10 months ago

Hello, thanks for the great work! Wanted to know if the image-text pairs post curation would be open-sourced. If they have already been, request you to direct me towards it. Thanks again!

howardhsu commented 10 months ago

not sure what post curation mean? If you mean curation after image downloading, the curation code is almost the same except we use 170k for 2.5B (depends on how where the 6% tail falls in the downloaded/NSFW/deduped data).