aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.04k stars 853 forks source link

ECSTaskCredentials refreshes too late #2498

Closed cfbao closed 1 year ago

cfbao commented 1 year ago

Describe the bug

ECSTaskCredentials by default has PreemptExpiryTime set to zero (as defined in RefreshingAWSCredentials). This causes errors when one uses RDS/Aurora IAM authentication with RDSAuthTokenGenerator in an ECS task:

Expected Behavior

Current Behavior

This has caused intermittent connection errors in our application.

Reproduction Steps

Set up an RDS/Aurora PostgreSQL DB with IAM authentication, then run the following code in an ECS Fargate task. You should see an authentication error in about 6 hours (the lifetime of IAM creds in Fargate)

using Npgsql;

var dataSourceBuilder = new NpgsqlDataSourceBuilder(
    // purposefully disabling pooling, because it can hide the issue sometimes
    "Host=<rds_host_name>;Port=5432;Database=<db_name>;Username=<user>;SSL Mode=require;Trust Server Certificate=true;Pooling=false"
);
dataSourceBuilder.UsePeriodicPasswordProvider(
    passwordProvider: (_, _) => ValueTask.FromResult(
        RDSAuthTokenGenerator.GenerateAuthToken("<rds_host_name>", 5432, "<user>")
    ),
    // a ~10-minute refresh interval should theoretically be safe, because generated tokens have nominal expiry of 15 minutes
    // using 9.6 here to avoid the interval coinciding with ECSTaskCredentials refreshes
    successRefreshInterval: TimeSpan.FromMinutes(9.6),
    failureRefreshInterval: TimeSpan.FromSeconds(5)
);

await using var dataSource = dataSourceBuilder.Build();

while(true) {
    try{
        await using var command = dataSource.CreateCommand("SELECT 1");
        await command.ExecuteNonQueryAsync();
        await Task.Delay(TimeSpan.FromMinutes(1));
    } catch(Exception ex) {
        // this will happen in ~6 hours, but it shouldn't
        Console.WriteLine($"Error talking to the DB: {ex}");
    }
}

Possible Solution

Set a non-zero PreemptExpiryTime for ECSTaskCredentials.

ECS Fargate seems to refresh the creds available at http://169.254.170.2${AWS_CONTAINER_CREDENTIALS_RELATIVE_URI} as early as 3 hours before the old one expires, so 3 hours may work? But for my purpose, I'd be happy with 1 hour or even just 15 minutes too.

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

AWSSDK.RDS 3.7.105.5

Targeted .NET Platform

.NET 6

Operating System and version

Debian

ashishdhingra commented 1 year ago

Appears to be a valid concern. But not sure on how we could decide on the refresh interval. Needs discussion with the team.

MariusVladu commented 1 year ago

Would it help to manually call FallbackCredentialsFactory.Reset(); before generating the new auth token ?

I think I'm facing the same issue: ECS fargate spot task, RDS MySQL db.t4g.small, .NET 7 with entity framework. I'm caching the entire connection string for 10 minutes and always setting it in a ConnectionOpen interceptor. DB instance has a max of 19 connections at a time with an average of 5-9. I'm using connection pooling (default settings).

Still, I'm getting rare database authentication failures some after ~6 hours, some at slightly different times.

No idea how else to troubleshoot this. Finding this open issue gave me some hope though.

peterrsongg commented 1 year ago

@MariusVladu @cfbao We intend to release the fix for this tomorrow. Will comment on here when it is officially released. Thank you

peterrsongg commented 1 year ago

@cfbao The fix was released in version 3.7.506.0. Thank you for bringing this to our attention. If the issue persists, feel free to re-open this.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

Runaground commented 1 year ago

@peterrsongg

I'm experiencing the same problem. We are using NpgsqlDataSourceBuilder with password rotation every 10 mins. We verified that password is indeed requested every 10 mins via RDSAuthTokenGenerator.GenerateAuthToken. Our AWSSDK.Core version is 3.7.107.108 (which is more recent than 3.7.506.0. release). Initially I opened an issue against NpgSql but now I think AWSSDK is a culprit https://github.com/npgsql/npgsql/issues/5163

cfbao commented 1 year ago

@peterrsongg

I don't think this issue is actually fixed. PreemptExpiryTime is now set to 5 minutes: https://github.com/aws/aws-sdk-net/blob/5a4abd66b9a8638ecba7a0705d7db82d1bd866ac/sdk/src/Core/Amazon.Runtime/Credentials/ECSTaskCredentials.cs#L76

which still isn't enough to cover the lifetime of an RDS auth token which is 15 minutes.

peterrsongg commented 1 year ago

@Runaground @cfbao My understanding was the credentials were being refreshed at the moment it was expiring which is what was causing this error, but it seems like for both of your cases 5 minutes is not enough. I'll look into increasing this to 20 minutes

peterrsongg commented 1 year ago

We decided to increase all of our credential providersPreemptyExpiryTime to 15 minutes. This will go out in our next manual release. I'll ping here when that happens. Appreciate your patience.

peterrsongg commented 1 year ago

@cfbao @Runaground The fix has been released in Core version 3.7.202.