ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
datacard is a new term we invented for training data distribution, so it's not the concrete dataset. You prob. need to follow the code in metaclip to do curation on CommonCrawl to get the full dataset.
Hi,
Thank you for this nice work and for making it public!
I do not know what to obtain the curated dataset, meaning how to use the datacard to obtain the training data to start the training.
If anyone has any idea, please let me know.
Thanks in advance.