Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
89 stars 19 forks source link

Plugin seems to not work on environment with FIPS #573

Closed im-konge closed 1 week ago

im-konge commented 1 month ago

What happened?

During our testing of Kafka TieredStorage feature with the Aiven plugin, S3 bucket and Minio, we discovered that when we try to use the plugin on OCP cluster where is FIPS enabled, the plugin throws exceptions about issues around security. After investigation done by @showuon it seems that the issue is with the FIPS and the whole security behind it. We are getting the following exception:

2024-07-24 05:11:51,022 ERROR [RemoteLogManager=0 partition=ILvJFXNlSCOPUybBzf_8Ew:my-topic-1887061142-1160371646-0] Error occurred while copying log segments of partition: ILvJFXNlSCOPUybBzf_8Ew:my-topic-1887061142-1160371646-0 (kafka.log.remote.RemoteLogManager$RLMTask) [kafka-rlm-thread-pool-3]
org.apache.kafka.server.log.remote.storage.RemoteStorageException: java.lang.RuntimeException: Unable to calculate a request signature: 
    at io.aiven.kafka.tieredstorage.RemoteStorageManager.copyLogSegmentData(RemoteStorageManager.java:256)
    at org.apache.kafka.server.log.remote.storage.ClassLoaderAwareRemoteStorageManager.lambda$copyLogSegmentData$2(ClassLoaderAwareRemoteStorageManager.java:74)
    at org.apache.kafka.server.log.remote.storage.ClassLoaderAwareRemoteStorageManager.withClassLoader(ClassLoaderAwareRemoteStorageManager.java:66)
    at org.apache.kafka.server.log.remote.storage.ClassLoaderAwareRemoteStorageManager.copyLogSegmentData(ClassLoaderAwareRemoteStorageManager.java:74)
    at kafka.log.remote.RemoteLogManager$RLMTask.copyLogSegment(RemoteLogManager.java:758)
    at kafka.log.remote.RemoteLogManager$RLMTask.copyLogSegmentsToRemote(RemoteLogManager.java:708)
    at kafka.log.remote.RemoteLogManager$RLMTask.run(RemoteLogManager.java:823)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.RuntimeException: Unable to calculate a request signature: 
    at software.amazon.awssdk.http.auth.aws.internal.signer.util.SignerUtils.sign(SignerUtils.java:138)
    at software.amazon.awssdk.http.auth.aws.internal.signer.util.SignerUtils.newSigningKey(SignerUtils.java:124)
    at software.amazon.awssdk.http.auth.aws.internal.signer.util.SignerUtils.deriveSigningKey(SignerUtils.java:106)
    at software.amazon.awssdk.http.auth.aws.internal.signer.DefaultV4RequestSigner.createSigningKey(DefaultV4RequestSigner.java:93)
    at software.amazon.awssdk.http.auth.aws.internal.signer.DefaultV4RequestSigner.sign(DefaultV4RequestSigner.java:62)
    at software.amazon.awssdk.http.auth.aws.internal.signer.V4RequestSigner.lambda$header$0(V4RequestSigner.java:61)
    at software.amazon.awssdk.http.auth.aws.internal.signer.DefaultAwsV4HttpSigner.doSign(DefaultAwsV4HttpSigner.java:269)
    at software.amazon.awssdk.http.auth.aws.internal.signer.DefaultAwsV4HttpSigner.sign(DefaultAwsV4HttpSigner.java:66)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.doSraSign(SigningStage.java:113)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.lambda$sraSignRequest$1(SigningStage.java:93)
    at software.amazon.awssdk.core.internal.util.MetricUtils.measureDuration(MetricUtils.java:60)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.sraSignRequest(SigningStage.java:92)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.execute(SigningStage.java:79)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.execute(SigningStage.java:50)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
    at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
    at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
    at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
    at software.amazon.awssdk.services.s3.DefaultS3Client.createMultipartUpload(DefaultS3Client.java:1463)
    at io.aiven.kafka.tieredstorage.storage.s3.S3MultiPartOutputStream.<init>(S3MultiPartOutputStream.java:77)
    at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.s3OutputStream(S3Storage.java:66)
    at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.upload(S3Storage.java:57)
    at io.aiven.kafka.tieredstorage.RemoteStorageManager.uploadSegmentLog(RemoteStorageManager.java:389)
    at io.aiven.kafka.tieredstorage.RemoteStorageManager.copyLogSegmentData(RemoteStorageManager.java:245)
    ... 12 more
Caused by: java.lang.RuntimeException: Unable to calculate a request signature: 
    at software.amazon.awssdk.http.auth.aws.internal.signer.util.SignerUtils.sign(SignerUtils.java:151)
    at software.amazon.awssdk.http.auth.aws.internal.signer.util.SignerUtils.sign(SignerUtils.java:136)
    ... 64 more
Caused by: java.security.InvalidKeyException: No installed provider supports this key: javax.crypto.spec.SecretKeySpec
    at java.base/javax.crypto.Mac.chooseProvider(Mac.java:391)
    at java.base/javax.crypto.Mac.init(Mac.java:434)
    at software.amazon.awssdk.http.auth.aws.internal.signer.util.SignerUtils.sign(SignerUtils.java:148)
    ... 65 more

For the full log from the Kafka broker, please see the attachment - logs-pod-cluster-def7af46-b-f2b5e74c-0-container-kafka.log

The plugin works perfectly on any other cluster that doesn't have FIPS enabled.

What did you expect to happen?

The expected output is to have data stored on the particular S3 storage even when the FIPS is enabled.

What else do we need to know?

OCP - 4.15 Kafka - 3.7.1 Aiven plugin version - 2024-04-02-1712056402

Thanks a lot for looking into this :)

jeqo commented 1 month ago

@im-konge thanks for reporting this issue. Looking at the stacktrace seems that this is triggered by the SDKs (AWS SDK in this case). We may need to check what options are provided by the SDKs, e.g. for AWS found this: https://docs.aws.amazon.com/sdkref/latest/guide/feature-endpoints.html

Unlike standard AWS endpoints, FIPS endpoints use a TLS software library that complies with FIPS 140-2. If this setting is enabled and a FIPS endpoint does not exist for the service in your AWS Region, the AWS call may fail.

Could you check if adding AWS_USE_FIPS_ENDPOINT is enough?

im-konge commented 1 month ago

Will try that and I'll let you know, thanks for checking it :)

showuon commented 1 week ago

After investigation, I found it's because the secret key length we set is too short, which is not compliant with FIPS. After increasing the secret key size, everything works fine. We can close this issue now. Thanks.