Open maxkochubey opened 2 years ago
@maxkochubey
Thanks for the detailed report. I will look into it.
However, in the meanwhile, for having awsDebugCreds=true
print out the debug output, the client side log level also needs to be set to DEBUG
. We did this to make sure that credential debugging which can be sensitive is not turned on by mistake.
Would it be possible for you to attach DEBUG
level logs from the client ?
It will help debug the issue far more easily.
Hi @sayantacC, sure!
The consumer process was started in the next environment:
$ printenv | grep -E 'AWS|KAFKA' | sort
AWS_REGION=ap-northeast-1
AWS_ROLE_ARN=arn:aws:iam::111111111111:role/test-eks-assumer
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
KAFKA_HEAP_OPTS=-Xms512m -Xmx2048m
KAFKA_OPTS=-Dlog4j.configuration=file:/opt/kafka-mm/log4j.properties
$ cat /opt/kafka-mm/log4j.properties
log4j.rootLogger=DEBUG, stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.stderr.Target=System.err
$ cat /opt/kafka-mm/client-iam.properties
security.protocol = SASL_SSL
sasl.mechanism = AWS_MSK_IAM
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::222222222222:role/test-msk-consumer" awsRoleSessionName="msk-test" awsDebugCreds=true;
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
Here is the consumer debug log: kafka-console-consumer.log
Also I run aws-cli
container in the same Kubernetes pod and checked that the role assume works itself:
bash-4.2# printenv | grep AWS
AWS_ROLE_ARN=arn:aws:iam::111111111111:role/test-eks-assumer
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
bash-4.2# aws sts get-caller-identity
{
"UserId": "AROA55BHPCBA3F246W7I7:botocore-session-1643875227",
"Account": "111111111111",
"Arn": "arn:aws:sts::111111111111:assumed-role/test-eks-assumer/botocore-session-1643875227"
}
bash-4.2# aws sts assume-role --role-arn arn:aws:iam::222222222222:role/test-msk-consumer --role-session-name aws-cli-pod
{
"Credentials": {
"AccessKeyId": "ASIAEXAMPLETNSVZ7HV2",
"SecretAccessKey": "PcLSb5dCu+CGZuAQ7f6a3WmZcMmfgFPNFmy/L3GC",
"SessionToken": "IQoJb3JpZ2luX2VjEDgaDmFwLXNvdXRoZWFzdC0xIkgwRgIhAPOlEs0qrr4cRVgFWFcONPyGR6HA4Nhf45cdU1zhMS4AAiEA6uNZS9dyqHU1FxCZUsOgRY+YuQWFWFD5YxprF8P/IbEqmAIIcRAAGgw1NjgwMzI5NDQ4NzAiDDqOIb3BN4nMipxU4Cr1AT3b07gAqxmZT/zfvO1XtvFuVVBqBSH2kQVZcPMe/KXiLHkFVmLNTt4JV/O8aRB/OXXjM+m5+OtQ9cDkNaY3mO6+C9JiJf+tO926oXkEy1e3yLTuti8m281AZ37Wa4Lc9ZRNtmSsnlkWuuWJIg/AwYqf2MkI1UgDliWlC+boORau7JxMFjb427JsKrfzbbNht7tO0SJ5xn0uX9ZuU4Dh68K853Kv5AKFFb1gQ1LT3uSmQaILFH0PQHqIPz8V/wplmOuMeNKYCXhysRjkqmgReTRjzw5Q6XwndT1XNLXZzDzEEAUYnUeS/5JA2Fm7j2Ygh4qDEqnbMO+X7o8GOpwBFqf9RGaCYK+VNxW8Egv1HBl4eEtLrNqtBYMBfQaWdlmGBqyIaIdrj1dpDFxlVdeo8z4tYaznyE4OzXAKl+imfkuADljVw/JFb1sKxBlJXBFfmUUg9SXflk5QxmERyP8o/fPcd0MGEv8Z8BGGkeiG4Hf+RLugXIYnMwjRsc00/CgiPW7MuFq0xHeFj42sKK9Km6mXfajoLWlU4Qk8",
"Expiration": "2022-02-03T09:01:51+00:00"
},
"AssumedRoleUser": {
"AssumedRoleId": "AROAYIQLG4LTBFJG3LJOW:aws-cli-pod",
"Arn": "arn:aws:sts::222222222222:assumed-role/test-msk-consumer/aws-cli-pod"
}
}
Thank you!
@maxkochubey Thanks for the debug logs.
You have surmised correctly that the Role in account 222222222222 is being assumed using the credentials from the EKSWorkerRole rather than those passed in by IRSA. The chaining of the roles is not doing what the aws cli does.
I will try and look into solving this problem but it is likely to take me some time to make the required change. The required change will most likely involve switching to use the aws sdk v2 credential providers.
In the meanwhile is it possible for you to: Change the IRSA role to be the one that has cross account access? The procedure described here avoids the additional indirection of having the IRSA role assume the cross account role.
Thanks, @sayantacC - will try it and get back with result.
Marking this as an enhancement.
Any progress on updating this to use aws sdk v2 credential providers? We are trying to accomplish the same thing as addressed above.
@TheRhino04 Sorry, I have not made progress on this yet. I will try making some progress over the few weeks.
In the meanwhile is it possible for you to try out the suggestion mentioned earlier: Change the IRSA role to be the one that has cross account access? The procedure described here avoids the additional indirection of having the IRSA role assume the cross account role.
Hi @sayantacC , thank you for taking look into this. We are also interested in the feature of using IRSA. Currently only worker node role is used, which is a blocker from security perspective.
Hi @sayantacC - we're also running into this on EKS.
Went a step further and tried to implement the suggestion you mentioned (having the service account bind to a cross-account role, and use MSK IAM without the additional assume). Confirmed that the role was respected from a call via CLI to aws sts get-caller-identity
which returned the cross-account role. When attempting to use MSK it failed to respect the role over Web Identity Token at all and still returned errors related to missing permissions for the ec2 node identity.
We're going to be chasing this via our AWS account manager to see if we can get some movement on this. I note you stated 24 days ago that you were going to try to make some progress on it - has any progress been made at all?
@Miscreancy, @TheRhino04, @eligithubacc
I have had a chance to make some progress. I have been working on this change in the migrate_to_v2 branch. I have verified that all existing functionality works with this change. However, I have not yet had the chance to setup an EKS cluster with IRSA to test it on. I will try to get the test setup soon and then work on the release.
In the meanwhile, if you wish to give that branch a try, I would love to learn if it solves your problem.
We got IRSA to work by letting the pod use the default role chain i.e. not specifying awsRoleArn
.
Please see https://github.com/aidanmelen/terraform-kubernetes-confluent-platform/blob/main/examples/hybrid_aws_msk/confluent_platform_sasl_iam_secure/main.tf#L40-L57 for more information.
Any update here? The given workarounds are not sufficient for my use case. My pod reads from one MSK cluster cross-account and writes to another MSK cluster in the same account as the EKS cluster.
It's possible the federated access works, but due to security restrictions within my organization, I am not able to give federated access to OIDC providers cross-account. Therefore, sts:AssumeRole is the preferred method of cross-account access.
Hi @stalbot15, in my case our issue was directly related to this ticket that we raised in aws-sdk-java-v2 project.
https://github.com/aws/aws-sdk-java-v2/issues/3555
If you are experiencing the same dependency-related IRSA killing issue as us you may be able to circumvent it with some of the approaches listed on that ticket.
Basically if you include IAM Auth and AWS Glue libraries in a project that uses IRSA you will have a bad time unless you take further action.
@Miscreancy, @TheRhino04, @eligithubacc
I have had a chance to make some progress. I have been working on this change in the migrate_to_v2 branch. I have verified that all existing functionality works with this change. However, I have not yet had the chance to setup an EKS cluster with IRSA to test it on. I will try to get the test setup soon and then work on the release.
In the meanwhile, if you wish to give that branch a try, I would love to learn if it solves your problem.
Thank you @sayantacC for the updates. I would be happy to test your code on EKS. How should I configure pom.xml
for testing your changes?
@sayantacC I am trying to build branch migrate_to_v2 locally with java 17 and I get
> Task :compileJava FAILED
./aws-msk-iam-auth/src/main/java/software/amazon/msk/auth/iam/internals/AuthenticationResponse.java:28: error: cannot find symbol
@Getter(onMethod = @__(@JsonIgnore))
^
symbol: class __
1 error
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':compileJava'.
> java.lang.IllegalAccessError: class lombok.javac.apt.LombokProcessor (in unnamed module @0x2e26bd7f) cannot access class com.sun.tools.javac.processing.JavacProcessingEnvironment (in module jdk.compiler) because module jdk.compiler does not export com.sun.tools.javac.processing to unnamed module @0x2e26bd7f
gradle:
------------------------------------------------------------
Gradle 7.6.1
------------------------------------------------------------
Build time: 2023-02-24 13:54:42 UTC
Revision: 3905fe8ac072bbd925c70ddbddddf4463341f4b4
Kotlin: 1.7.10
Groovy: 3.0.13
Ant: Apache Ant(TM) version 1.10.11 compiled on July 10 2021
JVM: 17.0.6 (Azul Systems, Inc. 17.0.6+10-LTS)
OS: Mac OS X 13.3.1 aarch64
@sayantacC We are facing the same issue - I raised a new ticket for this as there is no update for last 1 year
I am trying to run Kafka consumer in AWS-managed Kubernetes cluster (EKS) with IAM roles for service accounts feature enabled, but without any luck yet.
EKS cluster works in AWS account with id 111111111111. The consumer should connect from there to the AWS-managed MSK cluster with IAM authentication. The MSK cluster is located in AWS account with id 222222222222. I am using generic Kafka 2.8.1 binaries and "aws-msk-iam-auth" version 1.1.2. Inside the Kubernetes pod container, the library JAR is located in "/opt/kafka-libs" and the environment variable "CLASSPATH=/opt/kafka-libs/*" is exported.
When I am exporting AWS credentials of the IAM user created in account id 222222222222 which have the access to the MSK cluster topics, everything works fine and messages are received:
To make my setup more secure and get rid of credentials stored in Kubernetes secrets, I decided to setup IRSA and use it for MSK authentication. In account 111111111111 I've created the IAM role mapped with serviceAccount used by Kubernetes pod (
arn:aws:iam::111111111111:role/test-eks-assumer
). This role is allowed to assume the IAM role in account 222222222222 which have all required policies attached (actually, the policies are the same as for IAM user whose creds were used previously).So now, when the pod is started in EKS, it has the following environment vars defined by EKS:
The assume of the role works as well - I've checked it in the Kubernetes pod started from
amazon/aws-cli
image with the same serviceAccount/IAM role mapped:aws sts get-caller-identity
returns rolearn:aws:iam::111111111111:role/test-eks-assumer
andaws sts assume-role --role-arn "arn:aws:iam::222222222222:role/test-msk-consumer" --role-session-name "test-cli"
works fine.Since strict
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables are removed, I try to use the following client config:Where
arn:aws:iam::222222222222:role/test-msk-consumer
is the role allowed to be assumed by IAM rolearn:aws:iam::111111111111:role/test-eks-assumer
which is mapped to the pod's serviceAccount.But, the library returns the following:
As you can see,
aws-msk-iam-auth
tries to use the IAM role from EKS worker node instance and does not take into account the role which is defined by IRSA inAWS_ROLE_ARN
environment variables. For me, it looks very similar to https://github.com/aws/aws-sdk-java-v2/issues/1470.P.S. BTW, I did not succeed with using of
awsDebugCreds
option - it just does not have any effect :(