A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets.
We followed the dataset preparation process of DeCLIP here.
First, Download DeCLIP's YFCC15M label file 'yfcc15m_clean_open_data.json' at Google Driver.
Extract the URL from the JSON file and split it into several URL list files for download using split_download_task.py.
Crawl the image by the URL dirctely using auto_download.bat (Here, we use Wget, you may need to install that). The bat file is for Windows, and you may need to rewrite a shell file if using Linux. Or, simply download from the links below!
Check the downloaded images using check_images.py.
Dataset infos:
Web Drive links:
If the link fails, please leave a message in the issue.