apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
889 stars 144 forks source link

Snowflake Polaris Iceberg: NoSuchTableException: Table does not exist at location #504

Open ambaricloud opened 3 months ago

ambaricloud commented 3 months ago

Search before asking

Please describe the bug 🐞

Created an Iceberg table in Snowflake Polaris Internal Catalog via Spark. Able to perform all Iceberg table feature tasks. Failing when I try to convert to delta via X-Table.

cat polaris_ice_to_delta_orders_config.yaml sourceFormat: ICEBERG targetFormats:

java -cp "utilities-0.1.0-beta1-bundled.jar:iceberg-aws-1.3.1.jar:bundle-2.23.9.jar" io.onetable.utilities.RunSync --datasetConfig polaris_ice_to_delta_orders_config.yaml

SLF4J: No SLF4J providers were found. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details. SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier. SLF4J: Ignoring binding found at [jar:file:/Users/satyak/iceberg/demo/xtable/utilities-0.1.0-beta1-bundled.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. 2024-08-03 14:00:59 INFO io.onetable.utilities.RunSync:141 - Running sync for basePath s3://ambaricloudsatya/prod/orders/ for following table formats [DELTA] WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/satyak/iceberg/demo/xtable/utilities-0.1.0-beta1-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2024-08-03 14:01:03 INFO io.onetable.client.OneTableClient:264 - No previous OneTable sync for target. Falling back to snapshot sync. 2024-08-03 14:01:04 ERROR io.onetable.utilities.RunSync:164 - Error running sync for s3://ambaricloudsatya/prod/orders/ org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at location: s3://ambaricloudsatya/prod/orders at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:97) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergTableManager.lambda$getTable$1(IcebergTableManager.java:58) ~[utilities-0.1.0-beta1-bundled.jar:?] at java.util.Optional.orElseGet(Optional.java:369) ~[?:?] at io.onetable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:58) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergSourceClient.initSourceTable(IcebergSourceClient.java:81) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergSourceClient.getSourceTable(IcebergSourceClient.java:59) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergSourceClient.getCurrentSnapshot(IcebergSourceClient.java:129) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:36) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:164) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.client.OneTableClient.sync(OneTableClient.java:122) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.utilities.RunSync.main(RunSync.java:162) ~[utilities-0.1.0-beta1-bundled.jar:?]

Are you willing to submit PR?

Code of Conduct

the-other-tim-brown commented 3 months ago

@ambaricloud you'll need to specify the --icebergCatalogConfig in your sync command so that the table can be read from the catalog. Check out item 4 in the readme for running the bundled jar

ambaricloud commented 3 months ago

@the-other-tim-brown Thank you. For glue, I used the below catalog config. I need to check the same for Polaris. catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog catalogName: onetable catalogOptions: io-impl: org.apache.iceberg.aws.s3.S3FileIO warehouse: s3://ambaricloudsatya/prod

the-other-tim-brown commented 3 months ago

@the-other-tim-brown Thank you. For glue, I used the below catalog config. I need to check the same for Polaris. catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog catalogName: onetable catalogOptions: io-impl: org.apache.iceberg.aws.s3.S3FileIO warehouse: s3://ambaricloudsatya/prod

You will need to use Polaris since you are creating the Iceberg table in Polaris. The catalog should match the Iceberg catalog used.

jeremyakers commented 2 months ago

Hi @the-other-tim-brown - Quick question on this comment:

You will need to use Polaris

Is Polaris (or generic REST catalogs) supported by XTable? I'm trying to find this out and I didn't see anything in the docs about Polaris or generic REST catalogs in general.

vinishjail97 commented 1 month ago

@jeremyakers Yes it does, you need to follow instructions for Polaris to register an iceberg table present in storage. https://polaris.io/#section/Quick-Start/Defining-a-Catalog

Similar instructions for Unity, Glue etc. can be found here. If you are able to get it working for Polaris working, do you mind sharing the commands, we can add the docs similar to Glue and Unity catalog ?

https://xtable.apache.org/docs/unity-catalog#register-the-target-table-in-databricks-unity-catalog https://xtable.apache.org/docs/glue-catalog#register-the-target-table-in-glue-data-catalog

sagarlakshmipathy commented 1 month ago

I ran into an issue while using Snowflake's polaris catalog. Documenting here.

java -cp /Users/sagarl/Downloads/iceberg-spark-runtime-3.4_2.12-1.4.1.jar:/Users/sagarl/latest/incubator-xtable/xtable-utilities/target/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/Users/sagarl/Downloads/bundle-2.20.160.jar:/Users/sagarl/Downloads/url-connection-client-2.20.160.jar org.apache.xtable.utilities.RunSync --datasetConfig config.yaml --icebergCatalogConfig catalog.yaml

Error

2024-09-20 22:55:30 INFO  org.apache.iceberg.RemoveSnapshots:328 - Cleaning up expired files (local, incremental)
2024-09-20 22:55:31 ERROR org.apache.xtable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Delegate access to table with user-specified write location is temporarily not supported.
    at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:157) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:88) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:71) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:183) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:292) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:226) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.HTTPClient.post(HTTPClient.java:337) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:112) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:152) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:416) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:412) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:307) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
    at org.apache.xtable.iceberg.IcebergConversionTarget.completeSync(IcebergConversionTarget.java:221) ~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
    at org.apache.xtable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165) ~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
    at org.apache.xtable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
    at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:182) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
    at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:118) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
    at org.apache.xtable.utilities.RunSync.main(RunSync.java:191) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]

The sync did not completely happen at this point meaning the table gets created in target format in the catalog, but doesn't have data in it.

config.yaml

sourceFormat: HUDI
targetFormats:
  - ICEBERG
datasets:
  -
    tableBasePath: s3://xtable-demo-bucket/spark_demo/people
    tableName: people
    partitionSpec: city:VALUE
    namespace: spark_demo

catalog.yaml

catalogImpl: org.apache.iceberg.rest.RESTCatalog
catalogName: iceberg_catalog
catalogOptions:
  io-impl: org.apache.iceberg.aws.s3.S3FileIO
  warehouse: iceberg_catalog
  uri: https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog
  credential: <client-id>:<client-secret>
  header.X-Iceberg-Access-Delegation: vended-credentials
  scope: PRINCIPAL_ROLE:ALL
  client.region: us-west-2

I could access the table using spark-shell using command

pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.defaultCatalog=polaris \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
--conf spark.sql.catalog.polaris.uri=https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog \
--conf spark.sql.catalog.polaris.credential=<client-id>:<client-secret> \
--conf spark.sql.catalog.polaris.warehouse=iceberg_catalog \
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:my_spark_admin_role \
--conf spark.sql.catalog.polaris.client.region=us-west-2
>>> spark.sql("USE spark_demo")
DataFrame[]
>>> spark.sql("SHOW TABLES").show()
+----------+----------+-----------+                                             
| namespace| tableName|isTemporary|
+----------+----------+-----------+
|spark_demo|    people|      false|
|spark_demo|test_table|      false|
+----------+----------+-----------+

>>> spark.sql("SELECT * FROM people").show()
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name| id|name|age|city|create_ts|
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+

>>> 
sagarlakshmipathy commented 1 month ago

I believe this is a separate issue, so tracking it here https://github.com/apache/incubator-xtable/issues/545