apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
919 stars 147 forks source link

Glue Iceberg Table to Delta Conversion Failed #488

Closed ambaricloud closed 4 months ago

ambaricloud commented 4 months ago

Search before asking

Please describe the bug 🐞

  1. Created glue table from spark

    spark = SparkSession.builder \ .config('spark.jars.packages', 'org.apache.hadoop:hadoop-aws:3.3.4,org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.4.3,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')\ .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \ .config( "spark.sql.catalog.glue","org.apache.iceberg.spark.SparkCatalog") \ .config( "spark.sql.catalog.glue.warehouse",warehouse_path) \ .config( "spark.sql.catalog.glue.catalog-impl","org.apache.iceberg.aws.glue.GlueCatalog") \ .config( "spark.sql.catalog.glue.io-impl","org.apache.iceberg.aws.s3.S3FileIO") \ .getOrCreate()

spark.sql("CREATE TABLE glue.prod.orders \ (order_id BIGINT, customer_id BIGINT, order_amount DECIMAL(10, 2), Order_hr int) \ ") %%sparksql insert into glue.prod.orders values(1171,2,100.00,19)

X - Table Configuration cat s3_orders_ice_delta.yaml sourceFormat: ICEBERG targetFormats:

  • DELTA datasets:
  • tableBasePath: s3://<>/prod.db/orders tableDataPath: s3://<>/prod.db/orders/data tableName: orders namespace: prod.db

cat glue_catalog.yaml catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog catalogName: onetable catalogOptions: io-impl: org.apache.iceberg.aws.s3.S3FileIO warehouse: s3://<>/prod.db/orders

--Conversion java -jar /Users/satyak/iceberg/demo/xtable/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig /Users/satyak/iceberg/youtube/iceberg_test_cases/s3_orders_ice_delta.yaml --icebergCatalogConfig /Users/satyak/iceberg/youtube/iceberg_test_cases/glue_catalog.yaml

error WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features. 2024-07-10 11:53:33 INFO org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://ambaricloudsatya/prod.db/orders for following table formats [DELTA] 2024-07-10 11:53:34 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/satyak/iceberg/demo/xtable/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2024-07-10 11:53:35 WARN org.apache.spark.util.Utils:72 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 2024-07-10 11:53:35 WARN org.apache.spark.util.Utils:72 - Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 2024-07-10 11:53:36 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 2024-07-10 11:53:36 WARN org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2 2024-07-10 11:53:37 INFO org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore) is used for scheme s3 2024-07-10 11:53:38 INFO org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty 2024-07-10 11:53:39 INFO org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=32fd78bf-1b3f-4714-9a0f-fd4c25ec27da] Created snapshot InitialSnapshot(path=s3://<>/prod.db/orders/data/_delta_log, version=-1, metadata=Metadata(1e757708-0924-436f-b3f3-1b468a211199,null,null,Format(parquet,Map()),null,List(),Map(),Some(1720630419325)), logSegment=LogSegment(s3://<>/prod.db/orders/data/_delta_log,-1,List(),None,-1), checksumOpt=None) 2024-07-10 11:53:39 INFO org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync. 2024-07-10 11:53:39 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3://ambaricloudsatya/prod.db/orders java.lang.IllegalArgumentException: Cannot initialize Catalog implementation org.apache.iceberg.aws.glue.GlueCatalog: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog Missing org.apache.iceberg.aws.glue.GlueCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.aws.glue.GlueCatalog] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:224) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.lambda$getCatalog$6(IcebergTableManager.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) ~[?:?] at org.apache.xtable.iceberg.IcebergTableManager.getCatalog(IcebergTableManager.java:113) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:56) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] Caused by: java.lang.NoSuchMethodException: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog Missing org.apache.iceberg.aws.glue.GlueCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.aws.glue.GlueCatalog] at org.apache.iceberg.common.DynConstructors.buildCheckedException(DynConstructors.java:250) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.common.DynConstructors.access$200(DynConstructors.java:32) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.common.DynConstructors$Builder.buildChecked(DynConstructors.java:220) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:221) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] ... 11 more Suppressed: java.lang.ClassNotFoundException: org.apache.iceberg.aws.glue.GlueCatalog at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) ~[?:?] at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?] at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?] at java.lang.Class.forName0(Native Method) ~[?:?] at java.lang.Class.forName(Class.java:398) ~[?:?] at org.apache.iceberg.common.DynConstructors$Builder.impl(DynConstructors.java:149) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:221) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.lambda$getCatalog$6(IcebergTableManager.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) ~[?:?] at org.apache.xtable.iceberg.IcebergTableManager.getCatalog(IcebergTableManager.java:113) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:56) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]

Are you willing to submit PR?

Code of Conduct

sagarlakshmipathy commented 4 months ago

Hi @ambaricloud

You are missing AWS jars

Missing org.apache.iceberg.aws.glue.GlueCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.aws.glue.GlueCatalog]

Can you add iceberg-aws-x.x.x.jar and a compatible bundle-x.x.x.jar and sync again? Here's the article I wrote in the past which does a similar thing https://medium.com/@sagarlakshmipathy/using-onetable-to-translate-a-hudi-table-to-iceberg-format-and-sync-with-glue-catalog-8c3071f08877

Look at this comment for links of the jars: https://github.com/apache/incubator-xtable/issues/473#issuecomment-2181142552

dipankarmazumdar commented 4 months ago

@ambaricloud - Let us know if this has helped resolve the issue.

ambaricloud commented 4 months ago

Thank you. Now I can convert Glue Iceberg to Delta.

couple of comments.

  1. In datasetConfig requires namespace.
  2. In Catalog.yaml "warehouse: s3://<>/prod" ---Not sure the purpose this.

java -cp "utilities-0.1.0-beta1-bundled.jar:iceberg-aws-1.3.1.jar:bundle-2.23.9.jar" io.onetable.utilities.RunSync --datasetConfig ice_to_delta_orders_config.yaml --icebergCatalogConfig ice_to_delta_orders_catalog.yam

Regards,

"This email is completely written by a human (Satya Kondapalli)" Satya Kondapalli

Cell: 630-340-9704* | @.** | ambaricloud.com http://ambaricloud.com * YouTube: @.***/videos

On Wed, Jul 10, 2024 at 12:19 PM Sagar Lakshmipathy < @.***> wrote:

Hi @ambaricloud https://github.com/ambaricloud

You are missing AWS jars

Missing org.apache.iceberg.aws.glue.GlueCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.aws.glue.GlueCatalog]

Can you add iceberg-aws-x.x.x.jar and a compatible bundle-x.x.x.jar and sync again? Here's the article I wrote in the past which does a similar thing @.***/using-onetable-to-translate-a-hudi-table-to-iceberg-format-and-sync-with-glue-catalog-8c3071f08877

Look at this comment for links of the jars: #473 (comment) https://github.com/apache/incubator-xtable/issues/473#issuecomment-2181142552

— Reply to this email directly, view it on GitHub https://github.com/apache/incubator-xtable/issues/488#issuecomment-2221062642, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASKQRFN44IWDMIKG6D623W3ZLVUINAVCNFSM6AAAAABKVJ3762VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGA3DENRUGI . You are receiving this because you were mentioned.Message ID: @.***>

ambaricloud commented 4 months ago

Now, I am able to test incremental conversion too. Thank you.

dipankarmazumdar commented 4 months ago

Now, I am able to test incremental conversion too. Thank you.

Nice! Thanks for confirming.