Open ambaricloud opened 3 months ago
@ambaricloud you'll need to specify the --icebergCatalogConfig
in your sync command so that the table can be read from the catalog. Check out item 4 in the readme for running the bundled jar
@the-other-tim-brown Thank you. For glue, I used the below catalog config. I need to check the same for Polaris. catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog catalogName: onetable catalogOptions: io-impl: org.apache.iceberg.aws.s3.S3FileIO warehouse: s3://ambaricloudsatya/prod
@the-other-tim-brown Thank you. For glue, I used the below catalog config. I need to check the same for Polaris. catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog catalogName: onetable catalogOptions: io-impl: org.apache.iceberg.aws.s3.S3FileIO warehouse: s3://ambaricloudsatya/prod
You will need to use Polaris since you are creating the Iceberg table in Polaris. The catalog should match the Iceberg catalog used.
Hi @the-other-tim-brown - Quick question on this comment:
You will need to use Polaris
Is Polaris (or generic REST catalogs) supported by XTable? I'm trying to find this out and I didn't see anything in the docs about Polaris or generic REST catalogs in general.
@jeremyakers Yes it does, you need to follow instructions for Polaris to register an iceberg table present in storage. https://polaris.io/#section/Quick-Start/Defining-a-Catalog
Similar instructions for Unity, Glue etc. can be found here. If you are able to get it working for Polaris working, do you mind sharing the commands, we can add the docs similar to Glue and Unity catalog ?
https://xtable.apache.org/docs/unity-catalog#register-the-target-table-in-databricks-unity-catalog https://xtable.apache.org/docs/glue-catalog#register-the-target-table-in-glue-data-catalog
I ran into an issue while using Snowflake's polaris catalog. Documenting here.
java -cp /Users/sagarl/Downloads/iceberg-spark-runtime-3.4_2.12-1.4.1.jar:/Users/sagarl/latest/incubator-xtable/xtable-utilities/target/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/Users/sagarl/Downloads/bundle-2.20.160.jar:/Users/sagarl/Downloads/url-connection-client-2.20.160.jar org.apache.xtable.utilities.RunSync --datasetConfig config.yaml --icebergCatalogConfig catalog.yaml
2024-09-20 22:55:30 INFO org.apache.iceberg.RemoveSnapshots:328 - Cleaning up expired files (local, incremental)
2024-09-20 22:55:31 ERROR org.apache.xtable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Delegate access to table with user-specified write location is temporarily not supported.
at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:157) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:88) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:71) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:183) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:292) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:226) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.post(HTTPClient.java:337) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:112) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:152) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:416) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:412) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:307) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.xtable.iceberg.IcebergConversionTarget.completeSync(IcebergConversionTarget.java:221) ~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at org.apache.xtable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165) ~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at org.apache.xtable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:182) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:118) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at org.apache.xtable.utilities.RunSync.main(RunSync.java:191) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
The sync did not completely happen at this point meaning the table gets created in target format in the catalog, but doesn't have data in it.
sourceFormat: HUDI
targetFormats:
- ICEBERG
datasets:
-
tableBasePath: s3://xtable-demo-bucket/spark_demo/people
tableName: people
partitionSpec: city:VALUE
namespace: spark_demo
catalogImpl: org.apache.iceberg.rest.RESTCatalog
catalogName: iceberg_catalog
catalogOptions:
io-impl: org.apache.iceberg.aws.s3.S3FileIO
warehouse: iceberg_catalog
uri: https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog
credential: <client-id>:<client-secret>
header.X-Iceberg-Access-Delegation: vended-credentials
scope: PRINCIPAL_ROLE:ALL
client.region: us-west-2
I could access the table using spark-shell using command
pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.defaultCatalog=polaris \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
--conf spark.sql.catalog.polaris.uri=https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog \
--conf spark.sql.catalog.polaris.credential=<client-id>:<client-secret> \
--conf spark.sql.catalog.polaris.warehouse=iceberg_catalog \
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:my_spark_admin_role \
--conf spark.sql.catalog.polaris.client.region=us-west-2
>>> spark.sql("USE spark_demo")
DataFrame[]
>>> spark.sql("SHOW TABLES").show()
+----------+----------+-----------+
| namespace| tableName|isTemporary|
+----------+----------+-----------+
|spark_demo| people| false|
|spark_demo|test_table| false|
+----------+----------+-----------+
>>> spark.sql("SELECT * FROM people").show()
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name| id|name|age|city|create_ts|
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
>>>
I believe this is a separate issue, so tracking it here https://github.com/apache/incubator-xtable/issues/545
Search before asking
Please describe the bug 🐞
Created an Iceberg table in Snowflake Polaris Internal Catalog via Spark. Able to perform all Iceberg table feature tasks. Failing when I try to convert to delta via X-Table.
cat polaris_ice_to_delta_orders_config.yaml sourceFormat: ICEBERG targetFormats:
tableBasePath: s3://ambaricloudsatya/prod/orders/ tableName: orders
java -cp "utilities-0.1.0-beta1-bundled.jar:iceberg-aws-1.3.1.jar:bundle-2.23.9.jar" io.onetable.utilities.RunSync --datasetConfig polaris_ice_to_delta_orders_config.yaml
SLF4J: No SLF4J providers were found. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details. SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier. SLF4J: Ignoring binding found at [jar:file:/Users/satyak/iceberg/demo/xtable/utilities-0.1.0-beta1-bundled.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. 2024-08-03 14:00:59 INFO io.onetable.utilities.RunSync:141 - Running sync for basePath s3://ambaricloudsatya/prod/orders/ for following table formats [DELTA] WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/satyak/iceberg/demo/xtable/utilities-0.1.0-beta1-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2024-08-03 14:01:03 INFO io.onetable.client.OneTableClient:264 - No previous OneTable sync for target. Falling back to snapshot sync. 2024-08-03 14:01:04 ERROR io.onetable.utilities.RunSync:164 - Error running sync for s3://ambaricloudsatya/prod/orders/ org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at location: s3://ambaricloudsatya/prod/orders at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:97) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergTableManager.lambda$getTable$1(IcebergTableManager.java:58) ~[utilities-0.1.0-beta1-bundled.jar:?] at java.util.Optional.orElseGet(Optional.java:369) ~[?:?] at io.onetable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:58) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergSourceClient.initSourceTable(IcebergSourceClient.java:81) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergSourceClient.getSourceTable(IcebergSourceClient.java:59) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.iceberg.IcebergSourceClient.getCurrentSnapshot(IcebergSourceClient.java:129) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:36) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:164) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.client.OneTableClient.sync(OneTableClient.java:122) ~[utilities-0.1.0-beta1-bundled.jar:?] at io.onetable.utilities.RunSync.main(RunSync.java:162) ~[utilities-0.1.0-beta1-bundled.jar:?]
Are you willing to submit PR?
Code of Conduct