apache / polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg
https://polaris.apache.org/
Apache License 2.0
1.17k stars 130 forks source link

Upgrade Iceberg 1.7.0 #442

Open singhpk234 opened 1 week ago

singhpk234 commented 1 week ago

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change.

Upgrade Apache Iceberg to 1.7.0 Additionally :

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Test Configuration:

Checklist:

Please delete options that are not relevant.

flyrain commented 1 week ago

LGTM with a minor style issue:

> The following files had format violations:
      src/main/java/org/apache/polaris/service/catalog/BasePolarisCatalog.java
          @@ -47,7 +47,6 @@
           import·org.apache.iceberg.TableMetadata;
           import·org.apache.iceberg.TableMetadataParser;
           import·org.apache.iceberg.TableOperations;
          -import·org.apache.iceberg.aws.s3.S3FileIOProperties;
           import·org.apache.iceberg.catalog.Namespace;
           import·org.apache.iceberg.catalog.SupportsNamespaces;
           import·org.apache.iceberg.catalog.TableIdentifier;
  Run './gradlew :polaris-service:spotlessApply' to fix these violations.
singhpk234 commented 1 week ago

checking test failure

singhpk234 commented 1 week ago

on debugging it, looks like we need to handle this change cleanly [1], since polaris supports more endpoint than default impl, let me fix this !

[1] https://github.com/apache/iceberg/commit/a2b8008da7bc26e03248a35eeee60d1cc7e8499d#diff-86450612dbe323d6d06cbc3846aa1913f042eaedadc0ca027c36bfbe08d3a46c

MonkeyCanCode commented 1 week ago

@flyrain @singhpk234 should we update iceberg 1.7.0 dependencies for "getting start" examples as well? Most of them are using 1.5.2 iceberg runtime. I can take this if not already done.

snazy commented 1 week ago

Note: Today I realized that there are two "special" changes in Iceberg/Java 1.7.0:

  1. Object-storage layout - the generated paths have changed. This is relevant when IAM policies for S3 + GCS are generated. IIRC, IAM policies generated by Polaris are not yet adopted for object-storage layout (write.object-storage.enabled=true + data path). Before 1.7.0, the paths were generated like s3://bucket/<random-ish-base64>/..., since 1.7.0 it's s3://bucket/<random-ish-binary-part-1>/<random-ish-binary-part-2>/<random-ish-binary-part-3>/<random-ish-binary-part-4>/...)
  2. Iceberg/Java 1.7.0 refuses to submit some requests like the multi-table-commit, if the service does not return ConfigResponse.endpoints() or that list is empty.
flyrain commented 1 week ago

Object-storage layout - the generated paths have changed. This is relevant when IAM policies for S3 + GCS are generated. IIRC, IAM policies generated by Polaris are not yet adopted for object-storage layout (write.object-storage.enabled=true + data path). Before 1.7.0, the paths were generated like s3://bucket//..., since 1.7.0 it's s3://bucket/////...)

This is due to the new Object Storage v3 changes introduced in the 1.7 release. However, since Polaris doesn’t currently support the hash prefix before the table path, this update should not impact functionality as far as I understand. Let me know if I’m missing any nuances here!

Iceberg/Java 1.7.0 refuses to submit some requests like the multi-table-commit, if the service does not return ConfigResponse.endpoints() or that list is empty.

383 will resolve this issue, so let's prioritize it. In the short term, it should be manageable since most REST clients haven’t yet adopted the latest 1.7 release.

snazy commented 1 week ago

Object-storage layout - the generated paths have changed. This is relevant when IAM policies for S3 + GCS are generated. IIRC, IAM policies generated by Polaris are not yet adopted for object-storage layout (write.object-storage.enabled=true + data path). Before 1.7.0, the paths were generated like s3://bucket//..., since 1.7.0 it's s3://bucket/////...)

This is due to the new Object Storage v3 changes introduced in the 1.7 release. However, since Polaris doesn’t currently support the hash prefix before the table path, this update should not impact functionality as far as I understand. Let me know if I’m missing any nuances here!

For Polaris it's the amount of combinations that need to be supported: no object-storage prefix, the old and the new one. For GCS it's easier, because Google accepts regular expressions - S3 however does not.

Iceberg/Java 1.7.0 refuses to submit some requests like the multi-table-commit, if the service does not return ConfigResponse.endpoints() or that list is empty.

383 will resolve this issue, so let's prioritize it. In the short term, it should be manageable since most REST clients haven’t yet adopted the latest 1.7 release.

flyrain commented 1 week ago

@flyrain @singhpk234 should we update iceberg 1.7.0 dependencies for "getting start" examples as well? Most of them are using 1.5.2 iceberg runtime. I can take this if not already done.

+1, @MonkeyCanCode! Thanks for picking it up as a follow-up.

MonkeyCanCode commented 1 week ago

@flyrain @singhpk234 should we update iceberg 1.7.0 dependencies for "getting start" examples as well? Most of them are using 1.5.2 iceberg runtime. I can take this if not already done.

+1, @MonkeyCanCode! Thanks for picking it up as a follow-up.

Anytime. Will PR this once current PR is merged. Then i can build it locally for both client (spark) and server (polaris).

flyrain commented 1 week ago

I will merge this by EOD if there is no objection.

MonkeyCanCode commented 1 week ago

Here is a PR for the 2 missing changes: https://github.com/apache/polaris/pull/447