Sheth-G / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Kafka connect sink to S3 authentification parameters #10428 #5

Open Sheth-G opened 9 months ago

Sheth-G commented 9 months ago

Hudi 0.14 used for storage in ceph cluster with S3 connection and TLS authentication. For spark authentification on S3 there are following configuration parameters used: "spark.hadoop.fs.s3a.endpoint", "spark.hadoop.fs.s3a.access.key", "spark.hadoop.fs.s3a.secret.key". How to authentificate on ceph trough S3, using Kafka connect sink hudi connector? Which parameters and where do I use?

torvalds-dev[bot] commented 9 months ago

torvalds.dev is analyzing the ticket

torvalds-dev-testbot[bot] commented 9 months ago

torvalds.dev is analyzing the ticket

torvalds-dev[bot] commented 9 months ago

The user is trying to authenticate to a Ceph storage system through S3 using the Kafka Connect Sink Hudi connector. They have provided two configuration files: config-sink.json and config-sink-hive.json.

The user is already familiar with the following configuration parameters for Spark authentication on S3: spark.hadoop.fs.s3a.endpoint, spark.hadoop.fs.s3a.access.key, spark.hadoop.fs.s3a.secret.key.

To authenticate to Ceph through S3 using the Kafka Connect Sink Hudi connector, the user would need to provide similar parameters in the Hudi connector configuration. However, these parameters are not present in the provided configuration files.

Here is the plan to address the user's issue:

  1. The user should update the Hudi connector configuration files (config-sink.json and config-sink-hive.json) to include the necessary S3 authentication parameters. These parameters would be similar to the ones used for Spark authentication on S3, but their exact names might differ. The user should refer to the Hudi connector documentation or the Ceph S3 documentation to find the exact parameter names.

  2. After updating the configuration files, the user should restart the Hudi connector for the changes to take effect.

  3. If the user encounters any issues or errors after updating the configuration files and restarting the Hudi connector, they should provide the error messages for further troubleshooting.

Relevant files to the issue are config-sink.json and config-sink-hive.json.