Closed grbinho closed 1 year ago
Can this also be added to the defaultWarehouseLocation method? I'm happy to open a PR, but wanted to check first if there is a specific reason this setting is omitted.
I think we should add catalogId here as well, went through the pr which introduced it, can't find any objection to it. please feel free to open a PR to get more feedbacks
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Apache Iceberg version
0.14.0
Query engine
Other
Please describe the bug 🐞
Hi
We are using AWS Glue Jobs 3.0 (Spark 3.1) with Iceberg 14.0 (through Glue Marketplace connector).
We have a hub and spoke setup for Glue catalog, were we have a central AWS account hosting the catalog and the data and multiple data processing accounts running Glue jobs.
We are doing this by delegating access from central account to the root principal of the processing accounts and then using IAM role for Glue jobs that has cross account access.
This works fine if we write non iceberg tables.
When we try to write iceberg tables, we get an error:
This normally indicates permission issues, but since it works without iceberg, that should not be the case.
From the stack, we noticed that failure comes from
org.apache.iceberg.aws.glue.GlueCatalog.defaultWarehouseLocation(GlueCatalog.java:226
(In master this is now herehttps://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L259)
To me it looks that this method is missing
.catalogId(awsProperties.glueCatalogId())
on theGetDatabaseRequest.builder()
.Our current workaround is to use assume role feature, but we would prefer not to need that.
I see that all other calls in the
GlueCatalog
code are using theglue.id
catalog property.Can this also be added to the
defaultWarehouseLocation
method? I'm happy to open a PR, but wanted to check first if there is a specific reason this setting is omitted.To me it is expected that once
glue.id
is set, that should be the Glue catalog used for all Glue requests.Thanks for the time and the great library!
For reference, our setup with the workaround.
Also the stack.