duckdb / duckdb_aws

MIT License
34 stars 12 forks source link

Credentials do not renew automatically #26

Open TheEdgeOfRage opened 5 months ago

TheEdgeOfRage commented 5 months ago

We use k8s ServiceAccounts that assume IAM roles using OIDC and run DuckDB inside a k8s pod with the proper service account set. On startup, the following commands are sent to DuckDB:

INSTALL 'httpfs'
LOAD 'httpfs'
INSTALL 'aws'
LOAD 'aws'
CALL load_aws_credentials()

Our pod then serves requests and fetches data from S3 on request, meaning that it might be asked to get data from S3 much later compared to pod startup. The problem is that the STS token has expired by then, so DuckDB fails to get the data from S3. What's the best approach to renew these credentials, or ideally, could this plugin be updated to renew them automatically once they expire?

samansmink commented 4 months ago

Thanks for reporting, token refresh is not yet supported and definitely something to consider

CrashLaker commented 3 months ago

hi @samansmink , could you please explain in a little bit more of detail how does duckdb's threads use the credentials when running on a single machine? this question relates to the question i've asked here https://github.com/duckdb/duckdb/discussions/10996

in there i've 1 maquine spawning many threads to read many S3 objects. my problem is that this breaks as duckdb.duckdb.HTTPException: HTTP Error: HTTP GET error on 'https://bucket/prefix/filename.json' (HTTP 500) after some time.

my question is. when each thread starts. does it fetch the credentials only once to then load the files. or I could for example run another process in background that updates the credentials in ~/.aws/config

regards,c.

samansmink commented 3 months ago

Hi @CrashLaker sure!

does it fetch the credentials only once to

yes, it does this once, on secret creation.

I'm not sure what the 500 error is, AWS docs are not very clear: https://repost.aws/knowledge-center/http-5xx-errors-s3 could maybe be a throttling thing?

You could try messing with the http_retries settings here https://duckdb.org/docs/configuration/overview#configuration-reference

CrashLaker commented 3 months ago

hi @samansmink ,

thank you for your reply. actually after I sent you the message I realized that the 500 status code wasn't actually due to expiration (that would be 403) and then I was trying to find ways to tweak that exponential backoff setting.

you've pointed me to the right place in docs. thank you so much! i'll try.

regards,c.