fatiando / pooch

A friend to fetch your data files
https://www.fatiando.org/pooch
Other
630 stars 76 forks source link

Add support for downloading from AWS S3 butckets #363

Open WesleyTheGeolien opened 1 year ago

WesleyTheGeolien commented 1 year ago

Edit by @leouieda on 2024-02-19

Add a AWSDownloader that can fetch the data from AWS S3 storage. It should support an authentication token, ideally with the option to read it from an environment variable. See instructions for adding such a downloader in https://github.com/fatiando/pooch/issues/382#issuecomment-1952942987.


Original issue 👇🏾

Description of the desired feature: Data can be stored in cloud hosted buckets, s3, google storage, Azure, ...

These can provide either urls (I believe per-signining is possible) or some bucket location + authentification for example see the boto3 s3 python SDK

I am not sure on the data size but here is an example of downloading public data from s3: https://github.com/planet-os/notebooks/blob/master/aws/era5-s3-via-boto.ipynb

Minio can also be used docker image to run s3 locally for testing if better

Are you willing to help implement and maintain this feature? Not sure I know enough about pooch (first time contribution and usage) to be able to do anything of use but I could possibly help out with guidance / provide further info

remrama commented 9 months ago

Based off a similar need, I created a custom GSDownloader that downloads files from Google Cloud Storage. It's focused on files that require authentication. It uses the google-cloud-storage API for the download. Not sure if this request was for a more generalizable BucketDownloader, or something specific for AWS, like S3Downloader, but I wanted to link it here given the high overlap.