aws / aws-sdk-go-v2

AWS SDK for the Go programming language.
https://aws.github.io/aws-sdk-go-v2/docs/
Apache License 2.0
2.64k stars 638 forks source link

feature/rds/auth: Truncate RDS Auth Token Expiry Based on Credential Expiration and Document Limitations #2724

Open oakad opened 3 months ago

oakad commented 3 months ago

Acknowledgements

Describe the bug

BuildAuthToken signs the auth token using the credentials obtained from the supplied credential provider and then slaps a hard coded 900s TTL on it which leads users to assume that it's going to be valid for the next 15 minutes. However, in most practical circumstances (e.g. application running in an EKS pod with pod role), the credentials returned by the credential provider will have an expiration time of their own. Moreover, the way credentials cache works it is not uncommon for it to return credentials which are almost expired; by the time the signed token reaches the database it may be already sour or become sour soon thereafter (if application uses some sort of database password caching, again, a common implementation trope).

Expected Behavior

BuildAuthToken must return an auth token which is valid for the advertised life time. It should take steps to ensure that credentials obtained from the provider are not going to expire within the advertised life time - either by refreshing the credentials using whatever credential cache magic (preferred outcome) or by truncating the advertised life time of the token. There must be a general enough way to ensure that application is able to connect to the database at any given time without interruptions resulting from the "just expired" credentials.

Current Behavior

Applications operating in environments such as EKS pods, which rely on periodic refresh of credentials, get regular "access denied" errors due to IAM password being signed by almost expired credentials. If applications do any sort of database password caching, as may be necessary due to whatever approaches taken in the implementation of the connection pools, the "access denied" errors will come in huge bursts.

Reproduction Steps

  1. Run a database enabled application in an EKS pod with IAM role attached.
  2. Ensure that application keeps creating new database connections on a continuous basis.
  3. Observe the application getting "access denied" errors every hour as current pod credentials expire and get refreshed.

Possible Solution

See above.

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-sdk-go-v2/feature/rds/auth v1.4.7

The issue is not specific to a particular version.

Compiler and Version used

Not a compiler related issue

Operating System and version

Not an OS related issue

RanVaknin commented 3 months ago

Hi @oakad ,

Thanks for reaching out.

The behavior that you outline in your issue description is actually the expected and documented behavior:

"If you created a presigned URL using a temporary credential, the URL expires when the credential expires. In general, a presigned URL expires when the credential you used to create it is revoked, deleted, or deactivated. This is true even if the URL was created with a later expiration time."

Since you mentioned you work with EKS, I'll use IRSA credentials as an example: The temporary credentials assumed (implicitly) through AssumeRoleWithWebIdentity in your case are valid for 60min. If you set a cache ExpiryWindow that is equal or larger to the expiration of your URL, you can that you never run into a situation where your credentials would become stale before the presigned URL does. You can do that by overriding the default config:

    cfg, err := config.LoadDefaultConfig(context.TODO(),
        config.WithCredentialsCacheOptions(func(o *aws.CredentialsCacheOptions) {
            o.ExpiryWindow = 15 * time.Minute // 900 seconds
        }),
    )

BuildAuthToken must return an auth token which is valid for the advertised life time. It should take steps to ensure that credentials obtained from the provider are not going to expire within the advertised life time - either by refreshing the credentials using whatever credential cache magic (preferred outcome)

This is not an option. The credential refresh logic of the cache and URL presigning (which does not make an API call) must not be coupled. This behavior where a locally calculated string impacts the credential rotation can potentially be a big and invisible breaking change for many customers who rely on predictable cred rotation patterns.

by truncating the advertised life time of the token. There must be a general enough way to ensure that application is able to connect to the database at any given time without interruptions resulting from the "just expired" credentials.

This suggestion is feasible, and I have added it to our feature request backlog for further evaluation. The implementation of new features is driven by community feedback and demand. If this issue gains more interest or if more users encounter similar challenges, we will consider re-prioritizing its development accordingly.

Please try the suggested workaround and let us know if it resolves your issue.

Thanks again. Ran~

oakad commented 3 months ago

The behavior is documented elsewhere, but not mentioned in RDS docs; in fact, even support people are not aware of it. Support was under the impression that the RDS auth token should always be valid for the next 15 minutes after issue.

In general, we, the SDK users, expect the SDK to hide the weird details from us. Indeed, most services work just fine in all possible environments with the simplest flow:

awsConfig, err := config.LoadDefaultConfig(ctx)

// stuff
token, err := auth.BuildAuthToken(
    ctx, endpoint, region, user, awsConfig.Credentials,
)

It is most reasonable for the token to have an expiration time as long as the said time is clearly and truthfully communicated to the application. So, at the very least, the expiration time encoded in the token should not exceed the time left on the credentials, and it will be even better if the expiration time can be returned from the BuildAuthToken as a separate value for application perusal.

I will try your suggestion of explicitly reducing the credentials cache retention period.

oakad commented 3 months ago

Re priory: I think it's a sufficiently important issue to rectify. If it will help, we can ping our TAM.

RanVaknin commented 3 months ago

Hi @oakad,

We don't have visibility to the knowledge of all support people, but expiration of presigned URL being tied to the expiration of the credentials with which it is signed with is a basic AWS concept as shown in the documentation I shared.

In addition to truncating the advertised life time of the token, we can add a documentation blurb about it in our docs so it will be more discoverable.

Issues are prioritized based on community engagement (upvotes , comments, and cross SDK duplicates), and this can be worked around using configuration, at this point it remains a lower priority.

Thanks, Ran~

rittneje commented 3 months ago

This is not an option. The credential refresh logic of the cache and URL presigning (which does not make an API call) must not be coupled. This behavior where a locally calculated string impacts the credential rotation can potentially be a big and invisible breaking change for many customers who rely on predictable cred rotation patterns.

@RanVaknin Sorry, but this needs to be an option. First of all, it's what the users of the SDK actually want and expect. But also, the assertion that the presigned URL doesn't make an API call is not entirely true, as it MUST at least refresh the credentials if they are already expired. (If it doesn't that is a clear bug.)

As I mentioned in the other issue you linked, truncating the expiration is not at all desirable or useful.