apache / polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg
https://polaris.apache.org/
Apache License 2.0
1.17k stars 130 forks source link

Store AWS region in AwsStorageConfigurationInfo #455

Open eric-maynard opened 5 days ago

eric-maynard commented 5 days ago

Description

This adds support for a new property, region for AWS storage configurations.

Fixes #342

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

I'm able to create catalogs and add the region property to the StorageConfigInfo:

LD4RTJ0HY9:polaris emaynard$ curl -X POST http://localhost:8181/api/management/v1/catalogs \
> -H "Authorization: Bearer principal:root;realm:default-realm" \
> -H "Content-Type: application/json" \
> -d '{
>   "catalog": {
>     "type": "INTERNAL",
>     "name": "example_catalog",
>     "properties": {
>       "default-base-location": "s3://your-bucket/catalog-location/"
>     },
>     "storageConfigInfo": {
>       "storageType": "S3",
>       "roleArn": "arn:aws:iam::012345678901:role/jdoe",
>       "region": "us-east-2"
>     }
>   }
> }'
LD4RTJ0HY9:polaris emaynard$ curl -X GET http://localhost:8181/api/management/v1/catalogs/example_catalog \
> -H "Authorization: Bearer principal:root;realm:default-realm" \
> -H "Content-Type: application/json" | jq
{
  "type": "INTERNAL",
  "name": "example_catalog",
  "properties": {
    "default-base-location": "s3://your-bucket/catalog-location/"
  },
  "createTimestamp": 1731818113312,
  "lastUpdateTimestamp": 1731818113312,
  "entityVersion": 1,
  "storageConfigInfo": {
    "storageType": "S3",
    "roleArn": "arn:aws:iam::012345678901:role/jdoe",
    "externalId": null,
    "userArn": null,
    "region": "us-east-2",
    "allowedLocations": [
      "s3://your-bucket/catalog-location/"
    ]
  }
}
eric-maynard commented 3 days ago

Hey @singhpk234, the idea is that you can associate a region with a storage configuration so that the region can be used by any client that leverages credentials/files associated that storage configuration.

As you pointed out, it is not clear how this will work with catalog federation (cc @dennishuo). But I think it is also unclear how storage configurations will work with federated catalogs more generally -- for example, a single role ARN may not be valid for the entire federated catalog. So this is something our design for federation must address.

At the very least, we have discussed allowing storage configurations to be defined on a level more granular than the catalog (e.g. at the table or namespace level).

Maybe @munendrasn, the filer of #342, can also help provide some additional context here. For my part I am curious if there's a particular test case we can add here to make sure the issue reported in #342 is fully addressed.

munendrasn commented 3 days ago

@eric-maynard It is similar to the case @singhpk234 mentioned. We have custom catalog tracking Native iceberg tables and Federated Iceberg tables from different Catalogs. One such Catalog is Polaris Catalog. Our setup is in one AWS region but the Federated table's data is stored in another region.. So, accessing the table fails.

On the testing, are you looking to test it via Iceberg APIs or directly S3 client API? If AWS_REGION is set to one region but the table's storage in another region.. any read or listOperation would fail unless region is explicitly specified on s3Client creation

eric-maynard commented 2 days ago

Hi @munendrasn I see -- do the current changes here work for your use case then? client.region should be specified in the credentials map so long as it's set for the table's storage configuration.