aws / aws-msk-iam-sasl-signer-net

Apache License 2.0
10 stars 5 forks source link

Role credentials rotation on an ECS task are not handled properly #16

Open gabrielmoral opened 4 months ago

gabrielmoral commented 4 months ago

Describe the bug

I'm using mskAuthTokenGenerator.GenerateAuthTokenAsync(region) to get the token from the ambient credentials of an ECS (Fargate) task. It works fine when I deploy the app (new task created, new creds assigned) but after 6h~ there are kafka errors. I'm guessing that might be because the app creds are rotated at that time.

Expected Behavior

I would expect that the creds rotation are handle gracefully.

Current Behavior

This the error that appears in then logs: [thrd:sasl_ssl://xyz]: sasl_ssl://xyz:9098/1: SASL authentication error: [85082b3a-a3be-460e-bbbd-15781f994aeb]: Session too short (after 567ms in state AUTH_REQ)

Reproduction Steps

var mskAuthTokenGenerator = new AWSMSKAuthTokenGenerator();
producerBuilder.SetOAuthBearerTokenRefreshHandler((x, _) => OauthCallback(x, mskAuthTokenGenerator, log));

private static void OauthCallback(IClient client, AWSMSKAuthTokenGenerator mskAuthTokenGenerator,
        ILog log)
    {
        try
        {
            var (token, expiryMs) = mskAuthTokenGenerator.GenerateAuthTokenAsync(
                Amazon.RegionEndpoint.EUWest1).Result;

            if (string.IsNullOrEmpty(token))
            {
                throw new InvalidOperationException("MSK token is empty");
            }

            log.Info($"OauthCallback token received using {KafkaConnectionConfiguration.ConnectionString.RoleArn}, " +
                     $"it will expire in {expiryMs}ms from epoch, the docs say 900 seconds");
            client.OAuthBearerSetToken(token, expiryMs, "KafkaPrincipalName");
        }
        catch (Exception e)
        {
            log.Error("Error asking for a MSK token", e);
            client.OAuthBearerSetTokenFailure(e.ToString());
        }
    }

Possible Solution

No response

Additional Information/Context

No response

Version used

1.0.0

Operating System and version

Docker container Alpine 3.19

sankalpbhatia commented 3 months ago

Thanks for raising this and apologies for not getting back earlier on this. Can you share debug level client side logs? I am specifically interested in knowing which credentials provider implementation is being used to fetch credentials.

FilimonovEugene commented 3 months ago

I faced a similar issue in EKS with assumed IAM role credentials and found a workaround. I implemented my own AWS credentials rotation manager, which periodically retrieves AWS credentials by assuming an IAM role. These credentials are then passed to the AWSMSKAuthTokenGenerator.GenerateAuthTokenFromCredentialsProviderAsync method, instead of using GenerateAuthTokenAsync.

gabrielmoral commented 3 months ago

Thanks for raising this and apologies for not getting back earlier on this. Can you share debug level client side logs? I am specifically interested in knowing which credentials provider implementation is being used to fetch credentials.

@sankalpbhatia I'm sorry but I'm not in that space anymore, we decided to stick with Sasl for a while. Thanks for jumping anyways.