NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 21 forks source link

How can I unzip downloaded data on S3? #98

Open AliAzG opened 7 months ago

AliAzG commented 7 months ago

Hello everyone,

I'm downloading a package on S3 bucket using following command:

downloadcmd -dp 1227376 -s3 s3://bucket

Now the problem is, the downloaded file is a large zip file (~36 GB). I've tried many different ways (like writing lambda functions) to extract its data, but it seems it is almost impossible to unzip such a huge file using the methods I've tried.

is there any solution to unzip data or download unzipped data directly using nda-tools?

Thanks in advance.

jrussell9000 commented 1 month ago

This is a longstanding problem with S3 buckets - there's no built-in function to unpack compressed files. Lambda functions are usually recommended, BUT they have a file size limit of 1024MB. The best way to do this (to the best of my knowledge) is to download the file to an EC2 instance, decompress it, and put the contents back on S3.