LAION-AI / dataset-tasks

datasets that should be downloaded & converted to our standard training formart.
1 stars 0 forks source link

visual7w #25

Open christophschuhmann opened 2 years ago

christophschuhmann commented 2 years ago

https://paperswithcode.com/dataset/visual7w

marianna13 commented 2 years ago

Hi! What do we need to do with this dataset? Create a loader or something else? Thanks!

christophschuhmann commented 2 years ago

We would need the dataset in this format: https://github.com/LAION-AI/dataset-spec

christophschuhmann commented 2 years ago

as webdataset tar files

marianna13 commented 2 years ago

Do I need to upload this dataset somewhere or what should I need to do with it afterwards?

christophschuhmann commented 2 years ago

I will PM you on discord access details to a server, where you can upload it. I will copy it later to our S3 buckets. :)

rom1504 commented 2 years ago

If the data is public, i think it could be good to put the processed version in a public place as well, and not only the private S3 For example huggingface datasets could be such a public place

Is the data public / redistributable @marianna13 ?

marianna13 commented 2 years ago

Yes, it's licensed by MIT license