aws / aws-msk-iam-auth

Enables developers to use AWS Identity and Access Management (IAM) to connect to their Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters.
Apache License 2.0
142 stars 65 forks source link

Adding a proxy to delegate to appropriate IAMSaslClientProvider based on ClassLoader #82

Closed dannycranmer closed 1 year ago

dannycranmer commented 1 year ago

Issue #, if available:

https://github.com/aws/aws-msk-iam-auth/issues/36

Description of changes:

This change fixes the classloader issue experienced in this issue. I have introduced a new SaslClientFactory called ClassLoaderAwareIAMSaslClientFactory registered under the AWS_MSK_IAM mechanism. The existing IAMSaslClientFactory has been moved to a new mechanism name AWS_MSK_IAM.<classloader>. Each IAMLoginModule will register both mechanisms, we will end up with one ClassLoaderAwareIAMSaslClientFactory per JVM and n copies of IAMSaslClientFactory, n equals the number of ClassLoaders used.

ClassLoaderAwareIAMSaslClientFactory picks the correct IAMSaslClientFactory based on the classloader of the CallbackHandler

For single classloader environments, there will be 1 ClassLoaderAwareIAMSaslClientFactory and 1 IAMSaslClientFactory. We will incur a negligible performance hit delegating from one to the other.

Testing

I have tested against MSK using a Flink cluster running on EC2 box. I verified this issue resolves issue #36. I am waiting for additional test feedback from other stakeholders, and am hoping that this can be verified against MSK e2e tests.

Concerns:

Since we register n copies of IAMSaslClientFactory there is an opportunity for this to infinitely grow and leak, for example if a Flink job is continuously failing over. However, in this situation the job is already broken and this tradeoff might be acceptable against the current state, where multi classloader environments are not supported.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

hlteoh37 commented 1 year ago

Have we tested that this works to resolve the specific failure mode specified here? https://github.com/aws/aws-msk-iam-auth/issues/36#issuecomment-1070563135

If so, we might want to mention it in the overview

dannycranmer commented 1 year ago

Updated description wrt testing