apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.54k stars 2.26k forks source link

Support table-specific FileIO for Glue and Hive catalog through catalog override in table parameters #6810

Closed jackye1995 closed 11 months ago

jackye1995 commented 1 year ago

Feature Request / Improvement

cc @RussellSpitzer @pvary @amogh-jahagirdar @rajarshisarkar @singhpk234

We had some discussion that FileIO should technically be specific to each table so that all readers and writers use the same one for each table, but currently it is defined as catalog property and decided by the end user dynamically.

This issue is solved by REST catalog, but technically we can also achieve that in Glue, Hive, or any catalog that supports table parameters. (not the Iceberg table properties, but the ones we use to store table_type=ICEBERG and metadata_location)

The idea is that user can configure overrides of catalog property for specific tables through the table parameters part of the Glue/Hive metastore.

For example, if the Spark session default FileIO is HadoopFileIO, but I want to use S3FileIO for a specific table, I can update the table's parameters with the catalog properties like io-impl and any S3FileIO related configurations, and then we can update code to respect those overrides.

This is technically already happening today even in REST catalog through the config part of a LoadTableResponse: https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml#L1701, used in https://github.com/apache/iceberg/blob/b6b9972538ffcbae10b7e80e82cc444254d49103/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L310

In Glue catalog, we are also technically doing table specific dynamic AWS catalog properties override even today, for (1) LakeFormation related table specific security configurations, (2) table specific S3 tags: https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L205-L230

Any thoughts?

Query engine

None

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 11 months ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'