facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Other
1.27k stars 54 forks source link

Do not how what to do with datacard #56

Closed lilygeorgescu closed 6 months ago

lilygeorgescu commented 6 months ago

Hi,

Thank you for this nice work and for making it public!

I do not know what to obtain the curated dataset, meaning how to use the datacard to obtain the training data to start the training.

If anyone has any idea, please let me know.

Thanks in advance.

howardhsu commented 6 months ago

datacard is a new term we invented for training data distribution, so it's not the concrete dataset. You prob. need to follow the code in metaclip to do curation on CommonCrawl to get the full dataset.