abhimanyudubey / GeoYFCC

Dataset accompanying the paper "Adaptive Methods for Real-World Domain Generalization"
Creative Commons Zero v1.0 Universal
14 stars 0 forks source link

URL's for many images in the metadata file are broken #2

Closed virajprabhu closed 3 years ago

virajprabhu commented 3 years ago

Thanks for releasing this dataset! I was able to download the YFCC images from the URL's provided in the metadata file for ~730k out of the 1.1 million images. The URL's for the rest unfortunately appear to be broken.

Do you have any recommendations for a source that has all the images? I was thinking of trying https://pypi.org/project/yfcc100m/ that you recommend in your README – have you had luck with using that to download all images?

Thanks!

abhimanyudubey commented 3 years ago

Hey, indeed the URLs for YFCC images are broken and that isn't the best route, unfortunately. I haven't tried the PyPI repository myself but I believe since it hosts images separately on AWS it should be functional. Once I have some free time in the coming weeks I can try a script that just downloads relevant files from the corresponding shards to save time/space.

virajprabhu commented 3 years ago

I see, okay. A script to download the relevant files would be very helpful! I'll try the PyPI package in the meantime. Thanks!