deepghs / cheesechaser

Swiftly get tons of images from indexed tars on Huggingface
https://cheesechaser.deepghs.org/
Apache License 2.0
27 stars 0 forks source link

How would I download the whole dataset? #4

Closed zhizdev closed 3 months ago

zhizdev commented 3 months ago

Thanks for the awesome repo!

Curious on how one would actually download this dataset.

Would one do something like resource_ids=range(0, 7359990), when downloading?

And how would one download the metadata associate with it?

Thanks!

narugo1992 commented 3 months ago

the cheesechaser library is designed for batch retreiving images, not including metadata.

and, if u need the entire dataset, my suggestion is to download the full archives (and metadata file) from the huggingface repository, that should be much faster than cheesechaser.

zhizdev commented 3 months ago

Thanks so much!