aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.94k stars 702 forks source link

Support S3 Access Grants #2885

Open jornfranke opened 4 months ago

jornfranke commented 4 months ago

Is your idea related to a problem? Please describe. S3 Access Grants are the recommended solution for accessing files in S3 by end users (the target group of AWS SDK for Pandas). This should be supported by AWS SDK for Pandas.

Describe the solution you'd like When writing code in a notebook and using the AWS SDK for Pandas I want that for any S3 access it automatically requests from a given S3 Access Grant instance the credentials based on the S3 Url I provide. E.g. when executing the following code

wr.s3.read_parquet(f"s3://{bucket}/parquet/")

I want that AWS SDK for pandas automatically contacts in the background a preconfigured S3 Access Grant instance to obtain a token to be able to access the S3 location.

It may also automatically request, cache, and refreshe temporary credential tokens for all S3 requests that I run in my notebook.

AWS provided recently some libraries and support to enable this in general:

kukushking commented 4 months ago

HI @jornfranke thank you for opening this. Added to the roadmap and up for prioritisation with the team.