facebookresearch / CiT

Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".
Other
78 stars 1 forks source link

downloading YFCC100M #5

Open ojmichel opened 1 year ago

ojmichel commented 1 year ago

To download the data, the instructions tell us to to follow the directions at this link: https://github.com/facebookresearch/SLIP.

The SLIP repository links to this download page on the multimedia commons website. Currently it does not seem possible to get the dataset in this way since Yahoo Webscope is no longer hosting it. Secondly, even we have the data, it seems that the SLIP code processes it in a customized way and the script for doing so is not available to us (see this issue).

My main questions are:

  1. Is there currently any way that you are aware of to download the dataset?
  2. Once downloaded, could you share the code used by SLIP to process the data?

Any help would be greatly appreciated. Thank you!

howardhsu commented 1 year ago

Thanks for pointing out this issue. We didn't realize this before releasing, will spend some time for a solution but we are prioritizing completing a full release of this repo first.

ojmichel commented 1 year ago

Thanks very much! I have found a temporary solution for now.

TIEHua commented 1 year ago

Hi, sorry to bother you. I'm having the same problem. How do you solve the problem of training data preparation?