facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Other
1.24k stars 54 forks source link

Full per-sample metadata for the 400m and CC2.5B training sets #10

Open vishaal27 opened 1 year ago

vishaal27 commented 1 year ago

Hi, thanks for your great work and releasing both the metadata entries and the trained CLIP model weights. I was wondering if it would be possible for you to release the per-sample metadata (url, text caption etc) for both the datasets you released models for (400m and CC2.5B)---similar to how the laion-2b-en and datacomp1b splits are released. Please let me know if this is in the pipeline or if they are already released, please point me to them. Thanks!

howardhsu commented 1 year ago

thx for your interest. We are working on that.