facebookresearch / SLIP

Code release for SLIP Self-supervision meets Language-Image Pre-training
MIT License
743 stars 67 forks source link

Can the author provide the YFCC-100M data downloader? #13

Open linhuixiao opened 2 years ago

linhuixiao commented 2 years ago

Can the author or someone provide the YFCC-100M data downloader?

It mentioned that the YFCC-100M data format must follow as: ''' Download the YFCC100M dataset. Our dataloader expects the following dataset directory structure with 100 folders containing 1000 zip archives of 1000 images each. The concatenation of the folder, archive, and file names is the index of the image (i.e. image 12345678 is stored as 678.jpg within 12/345.zip): '''

It seems not the original data collect format.

thank you.

shugerdou commented 2 years ago

May I know where can we download 'yfcc100m_dataset.txt'?

normster commented 2 years ago

Sorry for the late reply. I did not download the data myself, so I won't be able to provide a download script. I'll look into whether it's possible for me to share the yfcc100m_dataset.txt metadata file and get back to you two.

linhuixiao commented 2 years ago

Could provide' yfcc100m_dataset.txt' already yet? thx

linhuixiao commented 2 years ago

Could you provide ' yfcc100m_dataset.txt' already yet? If it's convenient, please send me by email: linhui.xiao@foxmail.com, just used for academic research. thx!

Soonhwan-Kwon commented 2 years ago

I also failed to reproduce preprocessing of yfcc15m because yfcc100m_dataset.txt was missing.