apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.13k stars 2.13k forks source link

Allow static credentials for all AWS clients, not only for S3 #10614

Open morozov opened 2 months ago

morozov commented 2 months ago

Feature Request / Improvement

I want to register an Iceberg catalog of type AWS Glue in Flink with the code like this:

var properties = new HashMap<>();
properties.put("type", "iceberg");
properties.put("catalog-type", "glue");
properties.put("catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog");
properties.put("warehouse", "s3://my-bucket/path/to/warehouse");

var factory = new FlinkCatalogFactory();
var catalog = factory.createCatalog("glue_catalog", properties);

I develop my application in a local Kubernetes cluster, so I cannot use the following available authentication options:

  1. The default credential provider (there is no profile file in the container).
  2. Have the client assume an AWS role (there are no EC2 credentials available).

Instead, I want to use static client credentials (a pair of access key ID and secret). I couldn't find the user-facing documentation on how do do that, so I resorted to reading the source code.

Currently, static client credentials can be only configured for the S3 client: https://github.com/apache/iceberg/blob/0e7aa84b1dd378b4be56f5b45b6744b383501bd9/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L192-L216

These credentials do not affect the other clients, e.g. the AWS Glue one. As a result, in order to configure static credentials for the entire Glue catalog, one needs to implement a custom credentials provider using these parameters: https://github.com/apache/iceberg/blob/c68abfc9fd3956077b43aba20441f089bb8b93d6/aws/src/main/java/org/apache/iceberg/aws/AwsClientProperties.java#L37-L60

For example: https://github.com/apache/iceberg/blob/7071dc18ed66454542f466b5bfe8821028f2db0c/aws/src/test/java/org/apache/iceberg/aws/TestAwsClientFactories.java#L243-L253

Would a PR be accepted that in addition to the s3.* parameters mentioned above added the support for similarly named aws.* parameters? Such parameters would consistently apply to all AWS clients instantiated by DefaultAwsClientFactoryand could eventually deprecate the s3.* ones..

Query engine

None

nastra commented 2 months ago

@jackye1995 could you (or anyone on your team) take a look at this one please?

morozov commented 1 month ago

@jackye1995 any chance you code provide feedback on this one?