Cannot unzip the train_dataset

allenai / satlas-super-resolution

Apache License 2.0

220 stars 24 forks source link

Cannot unzip the train_dataset #12

Closed yunseok624 closed 8 months ago

yunseok624 commented 8 months ago

Hi,

I've successfully downloaded validation and test data. I've also download first 5 sets of train dataset from huggingface, however I cannot open the file, even with 7zip. Can someone tell me in detail how I should access the dataset?

Thank you in advance,

piperwolters commented 8 months ago

Hi, thank you for your interest in this project.

For the files from huggingface, you could first cat all of the .7z.001, .7z.002, ... files into a single .7z file. cat train_urban_set.7z.001 train_urban_set.7z.002 ... > train_urban_set.7z

Then you can use 7zip to unzip. 7z x train_urban_set.7z

Alternatively, run the following and 7zip will automatically detect all .7z.002, .7z.003, ... files. 7z x train_urban_set.7z.001

yunseok624 commented 8 months ago

Hi, thank you for your interest in this project.

For the files from huggingface, you have to first cat all of the .7z.001, .7z.002, ... files into a single .7z file. cat train_urban_set.7z.001 train_urban_set.7z.002 ... > train_urban_set.7z

Then you can use 7zip to unzip. 7z x train_urban_set.7z

Do I have to donwload all .7z file? Is it okay to download the first 5 (I don't have enouogh space to keep them all)? Also how do I cat those files?

yunseok624 commented 8 months ago

Does it take a long time to cat the files? I'm concatenating only two first files, but it takes long time.

piperwolters commented 8 months ago

Ah, I am mistaken - you do need all of the .7z files to be able to unzip. The ideal workflow would be to download all .7z files, and then run 7z x archive.7z.001 -o destination_directory which will automatically detect and unzip the other parts.

If you do not have enough room to keep all of them, this will be difficult. I can work on uploading a much smaller dataset, but cannot guarantee that a dataset that small will lead to successful super-resolution.

yunseok624 commented 8 months ago

I understood. Before I close the thread can you tell me the whole size of the dataset in GB? I'll try to find a server in my university if it's possible to keep all the dataset.

piperwolters commented 8 months ago

Of course, the whole urban training set is 939GB.