allenai / satlas

Apache License 2.0
202 stars 24 forks source link

Missing files in Hugging Face #46

Open ADHuan opened 5 months ago

ADHuan commented 5 months ago

Hi,

I recently downloaded the full Satlas pretrain dataset from Hugging Face. However, upon reviewing the file lists, I noticed that several tar files for the NAIP dataset are missing for the years 2013, 2016, 2017, and 2018. The missing files are illustrated in the attached screenshots below.

2013-1 2013-2 2016 2017 2018

Additionally, I have roughly calculated the total data size of the available tar files for 2013, 2016, 2017, and 2018. There seems to be a mismatch between this total and the data size listed in your AWS S3 bucket.

Could you please verify the completeness of the dataset and ensure that all files are available for download on Hugging Face?

favyen2 commented 5 months ago

Yes it is complete, and I checked the total data size matches for 2018 (1.7 TB). Your screenshots show that files from every year are available. NAIP images are captured once every 2-3 years at a given location.

favyen2 commented 5 months ago

OK I see now what you mean about missing files. I will check it.

favyen2 commented 4 months ago

In the meantime please download from S3: https://github.com/allenai/satlas/blob/main/satlaspretrain_urls.txt