aws / aws-msk-iam-auth

Enables developers to use AWS Identity and Access Management (IAM) to connect to their Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters.
Apache License 2.0
137 stars 65 forks source link

Unable to use regional sts endpoint when running on KDA app #118

Closed menuetb closed 7 months ago

menuetb commented 1 year ago

I'm writing a KDA application that needs to read data from an MSK cluster in another account. The VPC does not have internet access. I set up a regional endpoint for STS in my VPC. The MSK cluster uses IAM authentication and the set up of the cross account trust policy is done.

Here is the conf for my aws-msk-iam-auth client:

.setProperty("security.protocol", "SASL_SSL")
.setProperty("sasl.mechanism", "AWS_MSK_IAM")
.setProperty("sasl.jaas.config", "software.amazon.msk.auth.iam.IAMLoginModule required awsStsRegion=\""+ defaultRegion + "\" awsRoleArn=\""+ roleArn +"\" awsRoleSessionName=\"consumer\";")
.setProperty("sasl.client.callback.handler.class", "software.amazon.msk.auth.iam.IAMClientCallbackHandler")

I see in the logs that the client is not trying to use the regional endpoint that I configured but that he is still trying to use the global endpoint.

Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to sts.amazonaws.com:443 [sts.amazonaws.com/52.119.198.216] failed: connect timed out

So I can't use STS with an MSK app in a VPC without an internet connection.

cnukwas commented 1 year ago

I am running into very same issue though the documentation states that IAM java code should use the regional STS endpoint with MirrorMaker2 and MSK Connect to replicate Kafka data from one cluster to another. Did you find any workaround for this issue since creating a interface endpoint for sts doesn't seem to be helping?

Thanks

menuetb commented 1 year ago

The first things to try is to follow https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html and set

[default]
sts_regional_endpoints = regional

If you can't (it was my case with KDA), I created a PR with a fix: #119 we need to wait for it to be merged.

cnukwas commented 1 year ago

Thank you @menuetb for the response. I am using MSK Connect and I don't know if there is an option to set sts_regional_endpoints = regional in the properties though I have set the region and rolearn as below.

target.cluster.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="arn:aws:iam::xxxyyyzzz:role/msk-access-test" awsDebugCreds=true awsStsRegion="ca-central-1";

hhkkxxx133 commented 10 months ago

We are reverting the commits in #136 as we are seeing the following issue where passing only awsRoleArn to Jaas config without using an overrided awsStsRegion.

An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: Failed to find AWS IAM Credentials [Caused by aws_msk_iam_auth_shadow.com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Credential should be scoped to a valid region. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: 14905a5d-2bf2-4ff1-976a-c7d7ca5b9a02; Proxy: null)]) occurred when evaluating SASL token received from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.) (org.apache.kafka.common.network.Selector)
plazma-prizma commented 8 months ago

We also have this PR, are they related? https://github.com/aws/aws-msk-iam-auth/pull/37

menuetb commented 8 months ago

No, this PR #37 is useful if you can set the environment variable AWS_STS_REGIONAL_ENDPOINTS. In services such as Managed Service for Apache Flink we cant set this environment variable

agodwinflut commented 7 months ago

FWIW this also affects Scala Spark jobs on Glue.

Also, although only slightly related, the recently released MSK IAM library for Python has exactly the same problem. So pyflink and pyspark on Glue do not work with IAM and VPC endpoints for the same reason.

@hhkkxxx133 I think I can see why the error you mentioned was happening.

https://github.com/aws/aws-msk-iam-auth/blob/34d68b263394e915fcb9acb26ff5e90e207627d2/src/main/java/software/amazon/msk/auth/iam/internals/MSKCredentialProvider.java#L280

If the region parameter is not set, it defaults to "aws-global". But:

new EndpointConfiguration("sts.amazonaws.com", "aws-global")

is not valid, hence the error message: "Credential should be scoped to a valid region"

I think the logic is in the wrong place, the decision should happen when the STS client is constructed, something like:

AWSSecurityTokenServiceClientBuilder stsClientBuilder = AWSSecurityTokenServiceClientBuilder.standard();
if ("aws-global".equals(stsRegion)) {
    stsClientBuilder.setRegion(stsRegion);  // this is the existing behaviour
} else {
    stsClientBuilder.setEndpointConfiguration(
        new EndpointConfiguration(String.format("sts.%s.amazonaws.com", stsRegion), stsRegion)
    );
}
AWSSecurityTokenService stsClient = stsClientBuilder.build();

Alternatively, a separate parameter in the jaas config could be used, eg awsStsEndpoint. in some ways I think I prefer this second approach, because it makes very clear the intent of the user is to override the endpoint configuration.

I can submit a PR, if either of these approaches are OK?

Andy

sidyag commented 7 months ago

Thank you for the comment. This indeed fixed the issue we were having. I have merged the change already. Closing the issue.

github-actions[bot] commented 7 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.