Closed jamielwhite closed 6 months ago
I enabled DEBUG logging, and I see the application re-authenticates successfully but with a decreasing session timeout each time, until it's too short:
2023-11-28T23:33:16.653079751Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 898349 ms and session re-authentication on or after 836777 ms
2023-11-28T23:33:16.733016243Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 898268 ms and session re-authentication on or after 796444 ms
2023-11-28T23:33:19.903645191Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 895098 ms and session re-authentication on or after 763044 ms
2023-11-28T23:46:03.117576552Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 131883 ms and session re-authentication on or after 113503 ms
2023-11-28T23:46:33.305042537Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 101696 ms and session re-authentication on or after 89661 ms
2023-11-28T23:47:56.915128578Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 18086 ms and session re-authentication on or after 16443 ms
2023-11-28T23:48:03.451621377Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 11549 ms and session re-authentication on or after 9974 ms
2023-11-28T23:48:13.552575825Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 1449 ms and session re-authentication on or after 1329 ms
2023-11-28T23:48:13.591081907Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished re-authentication with session expiration in 1410 ms and session re-authentication on or after 1302 ms
...
2023-11-28T23:48:15.382982876Z [Thread-1] ERROR org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Connection to node 2147483645 (<cluster endpoint>) failed authentication due to: [18de9fc3-f444-4302-8487-1c38b629702c]: Session too short
Thank you for raising this. Great details!
We will deep dive on this asap and get back.
I updated our application to leave off awsRoleArn
, and it shows a much longer expiration time (1 hour vs 15 minutes). So it's possible I didn't wait long enough to see if the issue occurs when awsRoleArn
is not included in sasl.jaas.config
. I'll keep it running with this configuration to see if there are any reductions in the session expiration.
2023-11-29T00:53:09.024953992Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 3598977 ms and session re-authentication on or after 3228285 ms
2023-11-29T00:53:09.113288090Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 3598888 ms and session re-authentication on or after 3194801 ms
2023-11-29T00:53:12.238079044Z [Thread-1] DEBUG org.apache.kafka.common.security.authenticator.SaslClientAuthenticator - [Consumer clientId=consumer-test-group-2-1, groupId=test-group-2] Finished authentication with session expiration in 3595763 ms and session re-authentication on or after 3284138 ms
I was able to repro the issue locally, and I have a suspicion it might only be isolated to cases where you provide an awsRoleArn to the jaas config.
My hypothesis is that since STSAssumeRoleSessionCredentialsProvider
used in https://github.com/aws/aws-msk-iam-auth/blob/main/src/main/java/software/amazon/msk/auth/iam/internals/MSKCredentialProvider.java holds onto aws credentials until sometime before expiry, it returns the same credentials when the ExpiringCredentialRefreshingLogin
class looks for new credentials after 10 mins.
@jamielwhite were you able to get results for your last run (without passing awsRoleArn) ? If not, I will try to run a similar consumer app and observe results tomorrow.
Yes, the same issue eventually happened without passing awsRoleArn.
@jamielwhite we are trying a couple of things for this. Will share an update soon.
@jamielwhite can you share the setup which led to this issue without using awsRoleArn? Would be helpful to know the client properties you used in that case, and what credentials you ended up using at that point? Also, can you also share the session expiration in those cases?
The setup was the same with awsRoleArn
excluded, and the credentials we used for that case had full kafka-cluster:*
permissions. The session expiration was 1 hour (we're running Kubernetes pods and getting our credentials through a service account).
security.protocol=SASL_SSL
sasl.mechanism=OAUTHBEARER
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;
sasl.login.callback.handler.class=software.amazon.msk.auth.iam.IAMOAuthBearerLoginCallbackHandler
Hi @jamielwhite, I just pushed a fix for this issue. Would you be able to build a jar locally and verify if this works for you? We will have a release for this in the next few days.
Thanks @sankalpbhatia! We've made it a few minutes past the 15 minute mark where it initially failed, so it looks like this is working. I also have a consumer running without awsRoleArn
, so I'll let you know if we run into any issues there after an hour.
This is unrelated to the issue, but does AWS have any plans to release a SASL signer library in Ruby now that it's supported in other languages like Python and Go?
There are no plans to release a signer library in Ruby right now.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
@sankalpbhatia someone wrote the ruby signer library: https://github.com/bruce-szalwinski-he/aws-msk-iam-sasl-signer-ruby. Please consider moving this to aws org.
More details are present in https://github.com/aws/aws-sdk-ruby/discussions/2985
I have a Kafka consumer which is failing to re-authenticate. The consumer works for the first 15 minutes, but it fails once the credentials expire despite the logs indicating it has refreshed the credentials. I've replicated this issue with a Java application as well as the
kafka-console-consumer
.Here are the logs indicating the login is refreshed at the 10 minute mark, but the consumer fails to re-authenticate at 15 minutes:
This issue does not occur if I remove
awsRoleArn
fromsasl.jaas.config
, but the re-authentication fails if I include it. Here's what the properties file looks like:Kafka version: 3.6.0 aws-msk-iam-auth version: 2.0.0