allenai / satlas

Apache License 2.0
184 stars 19 forks source link

Consider hosting dataset on Huggingface & source.coop #15

Closed robmarkcole closed 5 months ago

robmarkcole commented 11 months ago

I'm noting very slow download times for the dataset (my connection is fast):

satlas-dataset-v1-sentinel2-small.tar                  39%[++] 183.55G  3.87MB/s

I've experienced very rapid downloads from Huggingface and suggest it as an additional location to host and distribute the dataset

Additionally https://beta.source.coop/ would be a relevant portal

favyen2 commented 11 months ago

I will try to add it here https://huggingface.co/allenai/satlas-pretrain but it may take some time due to large size of the dataset.

robmarkcole commented 10 months ago

Hi @favyen2 I see you got a couple of files up which is great. Can I request you prioritise the following data? Been attempting to download since start of week, still going

satlas_explorer_datasets_2023-07-24.tar                45%          ] 430.34G  3.34MB/s    eta 32h 8m
robmarkcole commented 10 months ago

For the explorer dataset it took most of the week to download the tar and most of the weekend to untar. On reviewing the labelled datasets:

The 3 small datasets could be uploaded as individual datasets - HF has 40GB limit (TBC) per zip/tar so these should be fine. This would be a much faster experience for people who only care about one of those

favyen2 commented 5 months ago

The dataset is now available on Hugging Face. The hand-labeled datasets for individual tasks are updated regularly and we are still deciding how to release those on an ongoing basis.