dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
219 stars 93 forks source link

AWS STS authentication errors when multithreading #259

Closed brabster closed 1 year ago

brabster commented 1 year ago

I am experiencing intermittent authentication issues when running dbt-athena in GitLab CI with multiple threads in an SSO-based setup. (dbt-athena 1.4.3)

The issues comes from boto: "An error occurred (InvalidIdentityToken) when calling the AssumeRoleWithWebIdentity operation: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements"

These errors are infrequent, but when they do occur we are unable to work around with retries; setting retries as high as 8 doesn't resolve the issue and we end up with errors that break the build. My team-mate found this GitLab doc that relates to the issue - seems to be a problem related to concurrent access to STS.

I don't know if you'll be able to reproduce this issue; after looking through the source code here and in pyathena, I wasn't sure how long the max retry interval would be under exponential backoff - looks like it's 100 somethings, if it's 100ms then the ability to change the max timeout and give STS a bit more breathing room in dbt-athena config might just get us round the problem... don't think I can already do that? Or any other ideas we can try?

brabster commented 1 year ago

I think the time unit in Tenacity is seconds (here) so 100s max seems like plenty. I'm now wondering if the problem might be the backoff strategy - possible for multiple calls to be made at the same time and then backoff in lockstep failing each time. Will try and confirm whether the jittered backoff strategy helps

jessedobbelaere commented 1 year ago

Hi @brabster we also use Gitlab with STS (assume-role), with many dbt repositories with threads: 8 and default set for num_retries. We use our own runners. I did not experience any errors like that yet.

You use aws sts assume-role-with-web-identity instead of assume-role it seems? And Gitlab is set up with SSO to the AWS environment? So to make it clear, it seems like an issue that only happens with aws sts assume-role-with-web-identity instead of assume-role (in my case) 🤔

brabster commented 1 year ago

Yes @jessedobbelaere pretty sure that's the situation for us, we're doing that assume role with web identity operation somewhere. Digging into to to understand better exactly what's going on...

nicor88 commented 1 year ago

@brabster to make it a bit more clear are you using OpenID (see here) or AWS SSO? to me seems the first, as there is an aws sts assume-role-with-web-identity that generaly is used with OpenID.

brabster commented 1 year ago

@nicor88 it's OpenID, we are doing something slightly different and instead of assuming the role up front, we're letting the AWS CLI take care of the temp creds as described at the end of the docs here https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-role.html

I've just about managed to put out the fire here and we're mostly up and running - it'll probably be next week before I get a chance to do some more digging on this but I'll share our findings

brabster commented 1 year ago

I don't think this is an issue for dbt-athena. We were able to work around the issue by explicitly using the AWS CLI to perform the call to assume role with web identity at the start of the build task with a reasonable duration.

It seems that if Boto is left to deal with it by specifying token in config then the credentials are being refreshed very aggressively/have a very short duration and that's causing a stampede of requests to refresh the token as all the threads try to do it at the same time.

Happy to close if that makes sense @nicor88

nicor88 commented 1 year ago

Is there something worth to add to our new freshly https://dbt-athena.github.io/? maybe in know issues?

brabster commented 1 year ago

Certainly. Can you point me at where I need to make changes, I can't see anything obvious in the repo or contributors?

nicor88 commented 1 year ago

here the page https://dbt-athena.github.io/docs/known-issues simply click here Edit this page that will fork the repo and let you modify the page. A little section about what you found can be helpful to others.

brabster commented 1 year ago

If you hadn't realised, it's more involved than that - I have to fork, etc. Anyway, added https://github.com/dbt-athena/dbt-athena.github.io/pull/7