apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
850 stars 139 forks source link

"IllegalStateException: Recursive update" when converting Hudi table to Delta #466

Open lucasmo opened 3 months ago

lucasmo commented 3 months ago

Search before asking

Please describe the bug 🐞

I’m trying to use XTable to convert a hudi source to a delta target and I am receiving the following exception. The table is active and frequently updated. It is being actively queried as a hudi table.

Is there any other debug information I can provide to make this more useful?

My git head is 4a96627a OS is Linux/Ubuntu Java 11 Modified log4j2.xml to set level=trace for org.apache.hudi, o.a.xtable

Run with stacktrace:

$ java -jar ./xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig config.yaml
WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features.
2024-06-05 23:22:05 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://hidden-s3-bucket/hidden-prefix/ for following table formats [DELTA]
2024-06-05 23:22:05 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:05 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-06-05 23:22:05 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-06-05 23:22:06 WARN  org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:155 - Loading Active commit timeline for s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]}
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__deltacommit__COMPLETED__20240605231917000]}
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 7 ms to read  0 instants, 0 replaced file groups
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.hbase.util.UnsafeAvailChecker (file:/incubator-xtable/xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hbase.util.UnsafeAvailChecker
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-06-05 23:22:08 INFO  org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:243 - Creating View Manager with storage type :MEMORY
2024-06-05 23:22:08 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:255 - Creating in-memory based Table View
2024-06-05 23:22:11 INFO  org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-06-05 23:22:11 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty
2024-06-05 23:22:13 INFO  org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=8eda3e8f-9dae-4d19-ac72-f625b8ccb0c5] Created snapshot InitialSnapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=-1, metadata=Metadata(167f7b26-f82d-4765-97b9-b6e47d9147ec,null,null,Format(parquet,Map()),null,List(),Map(),Some(1717629733296)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-06-05 23:22:13 INFO  org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-06-05 23:22:13 INFO  org.apache.hudi.common.table.TableSchemaResolver:317 - Reading schema from s3://hidden-s3-bucket/hidden-prefix/op_date=2024-06-05/3b5d27af-ef39-4862-bbd9-d4a010f6056e-0_0-71-375_20240605231837826.parquet
2024-06-05 23:22:14 INFO  org.apache.hudi.metadata.HoodieTableMetadataUtil:927 - Loading latest merged file slices for metadata table partition files
2024-06-05 23:22:14 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 1 ms to read  0 instants, 0 replaced file groups
2024-06-05 23:22:14 INFO  org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations
2024-06-05 23:22:14 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building file system view for partition (files)
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:435 - #files found in partition (files) =30, Time taken =40
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.HoodieTableFileSystemView:386 - Adding file-groups for partition :files, #FileGroups=1
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:165 - addFilesToView: NumFiles=30, NumFileGroups=1, FileGroupsCreationTime=15, StoreTimeTaken=1
2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:449 - Time to load partition (files) =57
2024-06-05 23:22:14 INFO  org.apache.hudi.metadata.HoodieBackedTableMetadata:451 - Opened metadata base file from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/files/files-0000-0_0-67-1304_20240605210834482001.hfile at instant 20240605210834482001 in 9 ms
2024-06-05 23:22:14 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]}
2024-06-05 23:22:14 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3://hidden-s3-bucket/hidden-prefix/
org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve list of partition from metadata
    at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:127) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.hudi.HudiDataFileExtractor.getFilesCurrentState(HudiDataFileExtractor.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.hudi.HudiConversionSource.getCurrentSnapshot(HudiConversionSource.java:97) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
Caused by: java.lang.IllegalStateException: Recursive update
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1739) ~[?:?]
    at org.apache.avro.util.MapUtil.computeIfAbsent(MapUtil.java:42) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificData.getClass(SpecificData.java:257) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:508) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:355) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:186) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[?:?]
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
    at org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$10(HoodieBackedTableMetadata.java:412) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) ~[?:?]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:412) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:255) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:145) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:316) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:125) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
    ... 6 more

config.yaml:

sourceFormat: HUDI
targetFormats:
  - DELTA
datasets:
  -
    tableBasePath: s3://hidden-s3-bucket/hidden-prefix
    tableName: hidden_table
    partitionSpec: op_date:VALUE

hoodie.properties from the table:

hoodie.table.timeline.timezone=LOCAL
hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.table.precombine.field=ts_millis
hoodie.table.version=6
hoodie.database.name=
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.metadata.partitions.inflight=
hoodie.table.checksum=2622850774
hoodie.partition.metafile.use.base.format=false
hoodie.table.cdc.enabled=false
hoodie.archivelog.folder=archived
hoodie.table.name=hidden_table
hoodie.populate.meta.fields=true
hoodie.table.type=COPY_ON_WRITE
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.base.file.format=PARQUET
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.metadata.partitions=files
hoodie.timeline.layout.version=1
hoodie.table.recordkey.fields=record_id
hoodie.table.partition.fields=op_date

I submitted this to the dev@ mailing list and received no response, so filing as an issue.

Are you willing to submit PR?

Code of Conduct

vinishjail97 commented 3 months ago

Thanks for reporting the issue @lucasmo.

Can you share the contents of .hoodie folder i.e s3://hidden-s3-bucket/hidden-prefix/.hoodie/ in a zip file ? Looks like the hoodie metadata table is corrupted.

For getting unblocked you can delete the metadata table inside .hoodie it should have the below file path and try the sync again.
s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/

lucasmo commented 3 months ago

@vinishjail97 Can you confirm that deleting the metadata prefix in S3 will not cause any issues (as long as I do it when something else is not writing)? Presumably that means it will just recreate it.

I can provide a zip file; is there a way to get it to you personally?

vinishjail97 commented 3 months ago

Yes that's correct. For the existing corrupted metadata, you can drop the file in this issue or share it on my email vinish@apache.org

lucasmo commented 3 months ago

@vinishjail97 Sent to your email.

I have deleted the .hoodie/metadata/ prefix in S3, but the save now causes this error (not xtable, the hudi save). Can you advise?

I have rolled back the changes by putting the metadata folder back in place.

org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata table
    at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:293)
    at org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273)
    at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1257)
    at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1297)
    at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139)
    at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:224)
    at org.apache.hudi.HoodieSparkSqlWriterInternal.liftedTree1$1(HoodieSparkSqlWriter.scala:504)
    at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:502)
    at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:204)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:121)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
    at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path s3://hidden-bucket/hidden-prefix/.hoodie/metadata/.hoodie
    at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:57)
    at org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:140)
    at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:692)
    at org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:85)
    at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:774)
    at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFromFilesystem(HoodieBackedTableMetadataWriter.java:366)
    at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeIfNeeded(HoodieBackedTableMetadataWriter.java:271)
    at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:175)
    at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:95)
    at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:72)
    at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:287)
    ... 57 more
Caused by: java.io.FileNotFoundException: No such file or directory 's3://hidden-bucket/hidden-prefix/.hoodie/metadata/.hoodie'
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:524)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:617)
    at org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:410)
    at org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
    at org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:404)
    at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
    ... 67 more
vinishjail97 commented 3 months ago

Your hudi writer needs to have hoodie.metadata.enable=false on the writer side for the write to go through as we have deleted the metadata table. From the hudi OSS documentation, there's no need to delete the metadata folder you can just disable the config and metadata table won't be used.
https://hudi.apache.org/docs/metadata#enable-hudi-metadata-table-and-multi-modal-index-in-write-side

lucasmo commented 2 months ago

So this gets me one step closer, but still not working.

The initial load works (I'll add the log at the end). The followup incremental load fails like this:

lfm@ubuntu:~/incubator-xtable$ java -jar xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar -d config.yaml
WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features.
2024-07-02 17:47:27 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://hidden-s3-bucket/hidden-prefix/ for following table formats [DELTA]
2024-07-02 17:47:27 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:47:27 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-07-02 17:47:27 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-07-02 17:47:27 WARN  org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-07-02 17:47:28 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-07-02 17:47:28 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:47:28 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:155 - Loading Active commit timeline for s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:47:28 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240702171813028__clean__COMPLETED__20240702171815000]}
2024-07-02 17:47:28 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:243 - Creating View Manager with storage type :MEMORY
2024-07-02 17:47:28 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:255 - Creating in-memory based Table View
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/lfm/incubator-xtable/xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-07-02 17:47:31 INFO  org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-07-02 17:47:31 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Loading version 0.
2024-07-02 17:47:32 INFO  org.apache.spark.sql.delta.DeltaLogFileIndex:60 - Created DeltaLogFileIndex(JSON, numFilesInSegment: 1, totalFileSize: 75250570)
2024-07-02 17:47:35 INFO  org.apache.spark.sql.delta.Snapshot:60 - [tableId=4da47f37-d68a-411a-8124-5dfed3c314fd] Created snapshot Snapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=0, metadata=Metadata(e65c4be0-428e-4ecd-8fb4-360a8fe22dda,hidden_table_name,,Format(parquet,Map()),{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"hidden_unique_id","type":"long","nullable":true,"metadata":{}},{"name":"op_date","type":"date","nullable":true,"metadata":{}},{"name":"previous_op_dates","type":{"type":"array","elementType":"date","containsNull":true},"nullable":true,"metadata":{}},{"name":"col101","type":"timestamp","nullable":true,"metadata":{}},{"name":"col102","type":"string","nullable":true,"metadata":{}},{"name":"col103","type":"boolean","nullable":true,"metadata":{}},{"name":"col104","type":"timestamp","nullable":true,"metadata":{}},{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col107","type":"string","nullable":true,"metadata":{}},{"name":"col108","type":"boolean","nullable":true,"metadata":{}},{"name":"noun_col105","type":"string","nullable":true,"metadata":{}},{"name":"col110","type":"boolean","nullable":true,"metadata":{}},{"name":"col111","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}}]},"containsNull":true},"nullable":true,"metadata":{}},{"name":"col112","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}},{"name":"col114","type":"long","nullable":true,"metadata":{}},{"name":"col115","type":"string","nullable":true,"metadata":{}},{"name":"col116","type":"string","nullable":true,"metadata":{}},{"name":"col117","type":"string","nullable":true,"metadata":{}},{"name":"col118","type":"string","nullable":true,"metadata":{}},{"name":"col119","type":"string","nullable":true,"metadata":{}},{"name":"col120","type":"string","nullable":true,"metadata":{}},{"name":"col121","type":"string","nullable":true,"metadata":{}},{"name":"col122","type":"string","nullable":true,"metadata":{}},{"name":"col123","type":"timestamp","nullable":true,"metadata":{}},{"name":"col124","type":"timestamp","nullable":true,"metadata":{}},{"name":"col125","type":"timestamp","nullable":true,"metadata":{}},{"name":"col126","type":"timestamp","nullable":true,"metadata":{}},{"name":"col127","type":"timestamp","nullable":true,"metadata":{}},{"name":"col128","type":"timestamp","nullable":true,"metadata":{}},{"name":"col129","type":"timestamp","nullable":true,"metadata":{}},{"name":"col130","type":"timestamp","nullable":true,"metadata":{}},{"name":"col131","type":"timestamp","nullable":true,"metadata":{}},{"name":"col132","type":"timestamp","nullable":true,"metadata":{}},{"name":"col133","type":"timestamp","nullable":true,"metadata":{}},{"name":"col134","type":"timestamp","nullable":true,"metadata":{}},{"name":"col135","type":"timestamp","nullable":true,"metadata":{}},{"name":"col136","type":"timestamp","nullable":true,"metadata":{}},{"name":"col137","type":"timestamp","nullable":true,"metadata":{}},{"name":"col138","type":"timestamp","nullable":true,"metadata":{}},{"name":"col139","type":"timestamp","nullable":true,"metadata":{}},{"name":"col140","type":"timestamp","nullable":true,"metadata":{}},{"name":"col141","type":"timestamp","nullable":true,"metadata":{}},{"name":"col142","type":"timestamp","nullable":true,"metadata":{}},{"name":"col143","type":"timestamp","nullable":true,"metadata":{}},{"name":"col144","type":"timestamp","nullable":true,"metadata":{}},{"name":"col145","type":"timestamp","nullable":true,"metadata":{}},{"name":"col146","type":"timestamp","nullable":true,"metadata":{}},{"name":"col147","type":"timestamp","nullable":true,"metadata":{}},{"name":"col148","type":"timestamp","nullable":true,"metadata":{}},{"name":"col149","type":"timestamp","nullable":true,"metadata":{}},{"name":"col150","type":"timestamp","nullable":true,"metadata":{}},{"name":"col151","type":"string","nullable":true,"metadata":{}},{"name":"col152","type":"string","nullable":true,"metadata":{}},{"name":"col153","type":"string","nullable":true,"metadata":{}},{"name":"col154","type":"string","nullable":true,"metadata":{}},{"name":"col155","type":"string","nullable":true,"metadata":{}},{"name":"col156","type":"string","nullable":true,"metadata":{}},{"name":"col157","type":"string","nullable":true,"metadata":{}},{"name":"col158","type":"string","nullable":true,"metadata":{}},{"name":"col159","type":"string","nullable":true,"metadata":{}},{"name":"col160","type":"string","nullable":true,"metadata":{}},{"name":"col161","type":"string","nullable":true,"metadata":{}},{"name":"op_date_utc","type":"date","nullable":true,"metadata":{}},{"name":"col163","type":"timestamp","nullable":true,"metadata":{}},{"name":"col164","type":"timestamp","nullable":true,"metadata":{}},{"name":"col165","type":"timestamp","nullable":true,"metadata":{}},{"name":"col166","type":"timestamp","nullable":true,"metadata":{}},{"name":"col167","type":"timestamp","nullable":true,"metadata":{}},{"name":"col168","type":"timestamp","nullable":true,"metadata":{}},{"name":"col169","type":"timestamp","nullable":true,"metadata":{}},{"name":"col170","type":"timestamp","nullable":true,"metadata":{}},{"name":"col171","type":"timestamp","nullable":true,"metadata":{}},{"name":"col172","type":"timestamp","nullable":true,"metadata":{}},{"name":"col173","type":"timestamp","nullable":true,"metadata":{}},{"name":"col174","type":"timestamp","nullable":true,"metadata":{}},{"name":"col175","type":"timestamp","nullable":true,"metadata":{}},{"name":"col176","type":"timestamp","nullable":true,"metadata":{}},{"name":"col177","type":"timestamp","nullable":true,"metadata":{}},{"name":"col178","type":"timestamp","nullable":true,"metadata":{}},{"name":"col179","type":"timestamp","nullable":true,"metadata":{}},{"name":"col180","type":"timestamp","nullable":true,"metadata":{}},{"name":"col181","type":"timestamp","nullable":true,"metadata":{}},{"name":"col182","type":"timestamp","nullable":true,"metadata":{}},{"name":"col183","type":"boolean","nullable":true,"metadata":{}},{"name":"col184","type":"date","nullable":true,"metadata":{}},{"name":"scheduled_hidden_unique_ids","type":{"type":"array","elementType":"long","containsNull":true},"nullable":true,"metadata":{}},{"name":"col186","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col187","type":"long","nullable":true,"metadata":{}},{"name":"col188","type":"long","nullable":true,"metadata":{}},{"name":"col189","type":"long","nullable":true,"metadata":{}},{"name":"col190","type":"long","nullable":true,"metadata":{}},{"name":"col191","type":"long","nullable":true,"metadata":{}},{"name":"col192","type":"long","nullable":true,"metadata":{}},{"name":"col193","type":"timestamp","nullable":true,"metadata":{}},{"name":"col194","type":"timestamp","nullable":true,"metadata":{}},{"name":"col195","type":"timestamp","nullable":true,"metadata":{}},{"name":"col196","type":"timestamp","nullable":true,"metadata":{}},{"name":"col197","type":"timestamp","nullable":true,"metadata":{}},{"name":"col198","type":"timestamp","nullable":true,"metadata":{}},{"name":"col199","type":"timestamp","nullable":true,"metadata":{}},{"name":"col200","type":"timestamp","nullable":true,"metadata":{}},{"name":"col201","type":"timestamp","nullable":true,"metadata":{}},{"name":"col202","type":"timestamp","nullable":true,"metadata":{}},{"name":"col203","type":"timestamp","nullable":true,"metadata":{}},{"name":"col204","type":"timestamp","nullable":true,"metadata":{}},{"name":"col205","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col206","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col184_local","type":"date","nullable":true,"metadata":{}},{"name":"col206_internal","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col209","type":"string","nullable":true,"metadata":{}},{"name":"col210","type":{"type":"array","elementType":"integer","containsNull":true},"nullable":true,"metadata":{}},{"name":"col211","type":{"type":"array","elementType":"timestamp","containsNull":true},"nullable":true,"metadata":{}},{"name":"col212","type":"string","nullable":true,"metadata":{}},{"name":"col213","type":"integer","nullable":true,"metadata":{}},{"name":"event_millis","type":"long","nullable":true,"metadata":{}},{"name":"_hoodie_is_deleted","type":"boolean","nullable":true,"metadata":{}},{"name":"xtable_partition_col_DAY_op_date","type":"date","nullable":true,"metadata":{"delta.generationExpression":"CAST(op_date as DATE)"}}]},List(xtable_partition_col_DAY_op_date),Map(XTABLE_METADATA -> {"lastInstantSynced":"2024-07-02T17:18:13.028Z","instantsToConsiderForNextSync":[],"version":0}, delta.logRetentionDuration -> interval 168 hours),Some(1719940693028)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,0,WrappedArray(S3AFileStatus{path=s3://hidden-s3-bucket/hidden-prefix/_delta_log/00000000000000000000.json; isDirectory=false; length=75250570; replication=1; blocksize=33554432; modification_time=1719942187000; access_time=0; owner=lfm; group=lfm; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=414a8c48671a5c1a268e391487db2926-2 versionId=null),None,1719942187000), checksumOpt=None)
Exception in thread "main" java.lang.ExceptionInInitializerError
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:398)
    at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:95)
    at org.apache.avro.util.ClassUtils.forName(ClassUtils.java:72)
    at org.apache.avro.specific.SpecificData.lambda$getClass$2(SpecificData.java:259)
    at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
    at org.apache.avro.util.MapUtil.computeIfAbsent(MapUtil.java:42)
    at org.apache.avro.specific.SpecificData.getClass(SpecificData.java:257)
    at org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:508)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237)
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
    at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:355)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:186)
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263)
    at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248)
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209)
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:173)
    at org.apache.xtable.hudi.HudiConversionSource.isAffectedByCleanupProcess(HudiConversionSource.java:164)
    at org.apache.xtable.hudi.HudiConversionSource.isIncrementalSyncSafeFrom(HudiConversionSource.java:148)
    at org.apache.xtable.conversion.ConversionController.isIncrementalSyncSufficient(ConversionController.java:244)
    at org.apache.xtable.conversion.ConversionController.lambda$getFormatsToSyncIncrementally$4(ConversionController.java:175)
    at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
    at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1764)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at org.apache.xtable.conversion.ConversionController.getFormatsToSyncIncrementally(ConversionController.java:178)
    at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:109)
    at org.apache.xtable.utilities.RunSync.main(RunSync.java:169)
Caused by: java.lang.IllegalStateException: Recursive update
    at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1760)
    at org.apache.avro.util.MapUtil.computeIfAbsent(MapUtil.java:42)
    at org.apache.avro.specific.SpecificData.getClass(SpecificData.java:257)
    at org.apache.avro.specific.SpecificData.getForSchema(SpecificData.java:164)
    at org.apache.avro.specific.SpecificDatumWriter.<init>(SpecificDatumWriter.java:47)
    at org.apache.hudi.avro.model.HoodieCleanPartitionMetadata.<clinit>(HoodieCleanPartitionMetadata.java:532)
    ... 38 more

The first load succeeds like this:

lfm@ubuntu:~/incubator-xtable$ java -jar xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar -d config.yaml
WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features.
2024-07-02 17:40:30 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://hidden-s3-bucket/hidden-prefix/ for following table formats [DELTA]
2024-07-02 17:40:30 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:40:30 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-07-02 17:40:30 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-07-02 17:40:31 WARN  org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-07-02 17:40:31 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-07-02 17:40:31 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:40:31 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:155 - Loading Active commit timeline for s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:40:32 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240702171813028__clean__COMPLETED__20240702171815000]}
2024-07-02 17:40:32 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:243 - Creating View Manager with storage type :MEMORY
2024-07-02 17:40:32 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:255 - Creating in-memory based Table View
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/lfm/incubator-xtable/xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-07-02 17:40:33 INFO  org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-07-02 17:40:34 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty
2024-07-02 17:40:34 INFO  org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=697e4b0d-05a7-45d0-9038-7a0bebf69cba] Created snapshot InitialSnapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=-1, metadata=Metadata(91d449f4-8540-4208-a0be-b2c134bde849,null,null,Format(parquet,Map()),null,List(),Map(),Some(1719942034877)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-07-02 17:40:34 INFO  org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-07-02 17:40:35 INFO  org.apache.hudi.common.table.TableSchemaResolver:317 - Reading schema from s3://hidden-s3-bucket/hidden-prefix/op_date=2024-06-29/c061e248-2981-463e-8055-b8e4d59c2a24-0_2-22-137_20240702171726140.parquet
2024-07-02 17:40:35 INFO  org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-07-02 17:40:49 INFO  org.apache.hudi.common.table.view.FileSystemViewManager:165 - Creating InMemory based view for basePath s3://hidden-s3-bucket/hidden-prefix
2024-07-02 17:40:49 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 1 ms to read  0 instants, 0 replaced file groups
2024-07-02 17:40:49 INFO  org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations
2024-07-02 17:40:49 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building file system view for partition (op_date=2021-10-29)
2024-07-02 17:40:49 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building file system view for partition (op_date=2019-08-07)
2024-07-02 17:40:49 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building file system view for partition (op_date=2023-08-12)
[ ... several thousand partitions ... ]
2024-07-02 17:42:58 INFO  org.apache.spark.sql.delta.DeltaLog:60 - No delta log found for the Delta table at s3://hidden-s3-bucket/hidden-prefix/_delta_log
2024-07-02 17:42:58 INFO  org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=91d449f4-8540-4208-a0be-b2c134bde849] Created snapshot InitialSnapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=-1, metadata=Metadata(e65c4be0-428e-4ecd-8fb4-360a8fe22dda,null,null,Format(parquet,Map()),null,List(),Map(),Some(1719942178350)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-07-02 17:43:05 INFO  org.apache.spark.sql.delta.OptimisticTransaction:60 - [tableId=e65c4be0,txnId=b9564248] Updated metadata from - to Metadata(e65c4be0-428e-4ecd-8fb4-360a8fe22dda,hidden_table_name,,Format(parquet,Map()),{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"hidden_unique_id","type":"long","nullable":true,"metadata":{}},{"name":"op_date","type":"date","nullable":true,"metadata":{}},{"name":"previous_op_dates","type":{"type":"array","elementType":"date","containsNull":true},"nullable":true,"metadata":{}},{"name":"col101","type":"timestamp","nullable":true,"metadata":{}},{"name":"col102","type":"string","nullable":true,"metadata":{}},{"name":"col103","type":"boolean","nullable":true,"metadata":{}},{"name":"col104","type":"timestamp","nullable":true,"metadata":{}},{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col107","type":"string","nullable":true,"metadata":{}},{"name":"col108","type":"boolean","nullable":true,"metadata":{}},{"name":"noun_col105","type":"string","nullable":true,"metadata":{}},{"name":"col110","type":"boolean","nullable":true,"metadata":{}},{"name":"col111","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}}]},"containsNull":true},"nullable":true,"metadata":{}},{"name":"col112","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}},{"name":"col114","type":"long","nullable":true,"metadata":{}},{"name":"col115","type":"string","nullable":true,"metadata":{}},{"name":"col116","type":"string","nullable":true,"metadata":{}},{"name":"col117","type":"string","nullable":true,"metadata":{}},{"name":"col118","type":"string","nullable":true,"metadata":{}},{"name":"col119","type":"string","nullable":true,"metadata":{}},{"name":"col120","type":"string","nullable":true,"metadata":{}},{"name":"col121","type":"string","nullable":true,"metadata":{}},{"name":"col122","type":"string","nullable":true,"metadata":{}},{"name":"col123","type":"timestamp","nullable":true,"metadata":{}},{"name":"col124","type":"timestamp","nullable":true,"metadata":{}},{"name":"col125","type":"timestamp","nullable":true,"metadata":{}},{"name":"col126","type":"timestamp","nullable":true,"metadata":{}},{"name":"col127","type":"timestamp","nullable":true,"metadata":{}},{"name":"col128","type":"timestamp","nullable":true,"metadata":{}},{"name":"col129","type":"timestamp","nullable":true,"metadata":{}},{"name":"col130","type":"timestamp","nullable":true,"metadata":{}},{"name":"col131","type":"timestamp","nullable":true,"metadata":{}},{"name":"col132","type":"timestamp","nullable":true,"metadata":{}},{"name":"col133","type":"timestamp","nullable":true,"metadata":{}},{"name":"col134","type":"timestamp","nullable":true,"metadata":{}},{"name":"col135","type":"timestamp","nullable":true,"metadata":{}},{"name":"col136","type":"timestamp","nullable":true,"metadata":{}},{"name":"col137","type":"timestamp","nullable":true,"metadata":{}},{"name":"col138","type":"timestamp","nullable":true,"metadata":{}},{"name":"col139","type":"timestamp","nullable":true,"metadata":{}},{"name":"col140","type":"timestamp","nullable":true,"metadata":{}},{"name":"col141","type":"timestamp","nullable":true,"metadata":{}},{"name":"col142","type":"timestamp","nullable":true,"metadata":{}},{"name":"col143","type":"timestamp","nullable":true,"metadata":{}},{"name":"col144","type":"timestamp","nullable":true,"metadata":{}},{"name":"col145","type":"timestamp","nullable":true,"metadata":{}},{"name":"col146","type":"timestamp","nullable":true,"metadata":{}},{"name":"col147","type":"timestamp","nullable":true,"metadata":{}},{"name":"col148","type":"timestamp","nullable":true,"metadata":{}},{"name":"col149","type":"timestamp","nullable":true,"metadata":{}},{"name":"col150","type":"timestamp","nullable":true,"metadata":{}},{"name":"col151","type":"string","nullable":true,"metadata":{}},{"name":"col152","type":"string","nullable":true,"metadata":{}},{"name":"col153","type":"string","nullable":true,"metadata":{}},{"name":"col154","type":"string","nullable":true,"metadata":{}},{"name":"col155","type":"string","nullable":true,"metadata":{}},{"name":"col156","type":"string","nullable":true,"metadata":{}},{"name":"col157","type":"string","nullable":true,"metadata":{}},{"name":"col158","type":"string","nullable":true,"metadata":{}},{"name":"col159","type":"string","nullable":true,"metadata":{}},{"name":"col160","type":"string","nullable":true,"metadata":{}},{"name":"col161","type":"string","nullable":true,"metadata":{}},{"name":"op_date_utc","type":"date","nullable":true,"metadata":{}},{"name":"col163","type":"timestamp","nullable":true,"metadata":{}},{"name":"col164","type":"timestamp","nullable":true,"metadata":{}},{"name":"col165","type":"timestamp","nullable":true,"metadata":{}},{"name":"col166","type":"timestamp","nullable":true,"metadata":{}},{"name":"col167","type":"timestamp","nullable":true,"metadata":{}},{"name":"col168","type":"timestamp","nullable":true,"metadata":{}},{"name":"col169","type":"timestamp","nullable":true,"metadata":{}},{"name":"col170","type":"timestamp","nullable":true,"metadata":{}},{"name":"col171","type":"timestamp","nullable":true,"metadata":{}},{"name":"col172","type":"timestamp","nullable":true,"metadata":{}},{"name":"col173","type":"timestamp","nullable":true,"metadata":{}},{"name":"col174","type":"timestamp","nullable":true,"metadata":{}},{"name":"col175","type":"timestamp","nullable":true,"metadata":{}},{"name":"col176","type":"timestamp","nullable":true,"metadata":{}},{"name":"col177","type":"timestamp","nullable":true,"metadata":{}},{"name":"col178","type":"timestamp","nullable":true,"metadata":{}},{"name":"col179","type":"timestamp","nullable":true,"metadata":{}},{"name":"col180","type":"timestamp","nullable":true,"metadata":{}},{"name":"col181","type":"timestamp","nullable":true,"metadata":{}},{"name":"col182","type":"timestamp","nullable":true,"metadata":{}},{"name":"col183","type":"boolean","nullable":true,"metadata":{}},{"name":"col184","type":"date","nullable":true,"metadata":{}},{"name":"scheduled_hidden_unique_ids","type":{"type":"array","elementType":"long","containsNull":true},"nullable":true,"metadata":{}},{"name":"col186","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col187","type":"long","nullable":true,"metadata":{}},{"name":"col188","type":"long","nullable":true,"metadata":{}},{"name":"col189","type":"long","nullable":true,"metadata":{}},{"name":"col190","type":"long","nullable":true,"metadata":{}},{"name":"col191","type":"long","nullable":true,"metadata":{}},{"name":"col192","type":"long","nullable":true,"metadata":{}},{"name":"col193","type":"timestamp","nullable":true,"metadata":{}},{"name":"col194","type":"timestamp","nullable":true,"metadata":{}},{"name":"col195","type":"timestamp","nullable":true,"metadata":{}},{"name":"col196","type":"timestamp","nullable":true,"metadata":{}},{"name":"col197","type":"timestamp","nullable":true,"metadata":{}},{"name":"col198","type":"timestamp","nullable":true,"metadata":{}},{"name":"col199","type":"timestamp","nullable":true,"metadata":{}},{"name":"col200","type":"timestamp","nullable":true,"metadata":{}},{"name":"col201","type":"timestamp","nullable":true,"metadata":{}},{"name":"col202","type":"timestamp","nullable":true,"metadata":{}},{"name":"col203","type":"timestamp","nullable":true,"metadata":{}},{"name":"col204","type":"timestamp","nullable":true,"metadata":{}},{"name":"col205","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col206","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col184_local","type":"date","nullable":true,"metadata":{}},{"name":"col206_internal","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col209","type":"string","nullable":true,"metadata":{}},{"name":"col210","type":{"type":"array","elementType":"integer","containsNull":true},"nullable":true,"metadata":{}},{"name":"col211","type":{"type":"array","elementType":"timestamp","containsNull":true},"nullable":true,"metadata":{}},{"name":"col212","type":"string","nullable":true,"metadata":{}},{"name":"col213","type":"integer","nullable":true,"metadata":{}},{"name":"event_millis","type":"long","nullable":true,"metadata":{}},{"name":"_hoodie_is_deleted","type":"boolean","nullable":true,"metadata":{}},{"name":"xtable_partition_col_DAY_op_date","type":"date","nullable":true,"metadata":{"delta.generationExpression":"CAST(op_date as DATE)"}}]},List(xtable_partition_col_DAY_op_date),Map(XTABLE_METADATA -> {"lastInstantSynced":"2024-07-02T17:18:13.028Z","instantsToConsiderForNextSync":[],"version":0}, delta.logRetentionDuration -> interval 168 hours),Some(1719940693028))
2024-07-02 17:43:05 INFO  org.apache.spark.sql.delta.OptimisticTransaction:60 - [tableId=e65c4be0,txnId=b9564248] Attempting to commit version 0 with 4145 actions with Serializable isolation level
2024-07-02 17:43:07 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Creating a new snapshot v0 for commit version 0
2024-07-02 17:43:07 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Loading version 0.
2024-07-02 17:43:07 INFO  org.apache.spark.sql.delta.DeltaLogFileIndex:60 - Created DeltaLogFileIndex(JSON, numFilesInSegment: 1, totalFileSize: 75250570)
2024-07-02 17:43:11 INFO  org.apache.spark.sql.delta.Snapshot:60 - [tableId=e65c4be0-428e-4ecd-8fb4-360a8fe22dda] Created snapshot Snapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=0, metadata=Metadata(e65c4be0-428e-4ecd-8fb4-360a8fe22dda,hidden_table_name,,Format(parquet,Map()),{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"hidden_unique_id","type":"long","nullable":true,"metadata":{}},{"name":"op_date","type":"date","nullable":true,"metadata":{}},{"name":"previous_op_dates","type":{"type":"array","elementType":"date","containsNull":true},"nullable":true,"metadata":{}},{"name":"col101","type":"timestamp","nullable":true,"metadata":{}},{"name":"col102","type":"string","nullable":true,"metadata":{}},{"name":"col103","type":"boolean","nullable":true,"metadata":{}},{"name":"col104","type":"timestamp","nullable":true,"metadata":{}},{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col107","type":"string","nullable":true,"metadata":{}},{"name":"col108","type":"boolean","nullable":true,"metadata":{}},{"name":"noun_col105","type":"string","nullable":true,"metadata":{}},{"name":"col110","type":"boolean","nullable":true,"metadata":{}},{"name":"col111","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}}]},"containsNull":true},"nullable":true,"metadata":{}},{"name":"col112","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}},{"name":"col114","type":"long","nullable":true,"metadata":{}},{"name":"col115","type":"string","nullable":true,"metadata":{}},{"name":"col116","type":"string","nullable":true,"metadata":{}},{"name":"col117","type":"string","nullable":true,"metadata":{}},{"name":"col118","type":"string","nullable":true,"metadata":{}},{"name":"col119","type":"string","nullable":true,"metadata":{}},{"name":"col120","type":"string","nullable":true,"metadata":{}},{"name":"col121","type":"string","nullable":true,"metadata":{}},{"name":"col122","type":"string","nullable":true,"metadata":{}},{"name":"col123","type":"timestamp","nullable":true,"metadata":{}},{"name":"col124","type":"timestamp","nullable":true,"metadata":{}},{"name":"col125","type":"timestamp","nullable":true,"metadata":{}},{"name":"col126","type":"timestamp","nullable":true,"metadata":{}},{"name":"col127","type":"timestamp","nullable":true,"metadata":{}},{"name":"col128","type":"timestamp","nullable":true,"metadata":{}},{"name":"col129","type":"timestamp","nullable":true,"metadata":{}},{"name":"col130","type":"timestamp","nullable":true,"metadata":{}},{"name":"col131","type":"timestamp","nullable":true,"metadata":{}},{"name":"col132","type":"timestamp","nullable":true,"metadata":{}},{"name":"col133","type":"timestamp","nullable":true,"metadata":{}},{"name":"col134","type":"timestamp","nullable":true,"metadata":{}},{"name":"col135","type":"timestamp","nullable":true,"metadata":{}},{"name":"col136","type":"timestamp","nullable":true,"metadata":{}},{"name":"col137","type":"timestamp","nullable":true,"metadata":{}},{"name":"col138","type":"timestamp","nullable":true,"metadata":{}},{"name":"col139","type":"timestamp","nullable":true,"metadata":{}},{"name":"col140","type":"timestamp","nullable":true,"metadata":{}},{"name":"col141","type":"timestamp","nullable":true,"metadata":{}},{"name":"col142","type":"timestamp","nullable":true,"metadata":{}},{"name":"col143","type":"timestamp","nullable":true,"metadata":{}},{"name":"col144","type":"timestamp","nullable":true,"metadata":{}},{"name":"col145","type":"timestamp","nullable":true,"metadata":{}},{"name":"col146","type":"timestamp","nullable":true,"metadata":{}},{"name":"col147","type":"timestamp","nullable":true,"metadata":{}},{"name":"col148","type":"timestamp","nullable":true,"metadata":{}},{"name":"col149","type":"timestamp","nullable":true,"metadata":{}},{"name":"col150","type":"timestamp","nullable":true,"metadata":{}},{"name":"col151","type":"string","nullable":true,"metadata":{}},{"name":"col152","type":"string","nullable":true,"metadata":{}},{"name":"col153","type":"string","nullable":true,"metadata":{}},{"name":"col154","type":"string","nullable":true,"metadata":{}},{"name":"col155","type":"string","nullable":true,"metadata":{}},{"name":"col156","type":"string","nullable":true,"metadata":{}},{"name":"col157","type":"string","nullable":true,"metadata":{}},{"name":"col158","type":"string","nullable":true,"metadata":{}},{"name":"col159","type":"string","nullable":true,"metadata":{}},{"name":"col160","type":"string","nullable":true,"metadata":{}},{"name":"col161","type":"string","nullable":true,"metadata":{}},{"name":"op_date_utc","type":"date","nullable":true,"metadata":{}},{"name":"col163","type":"timestamp","nullable":true,"metadata":{}},{"name":"col164","type":"timestamp","nullable":true,"metadata":{}},{"name":"col165","type":"timestamp","nullable":true,"metadata":{}},{"name":"col166","type":"timestamp","nullable":true,"metadata":{}},{"name":"col167","type":"timestamp","nullable":true,"metadata":{}},{"name":"col168","type":"timestamp","nullable":true,"metadata":{}},{"name":"col169","type":"timestamp","nullable":true,"metadata":{}},{"name":"col170","type":"timestamp","nullable":true,"metadata":{}},{"name":"col171","type":"timestamp","nullable":true,"metadata":{}},{"name":"col172","type":"timestamp","nullable":true,"metadata":{}},{"name":"col173","type":"timestamp","nullable":true,"metadata":{}},{"name":"col174","type":"timestamp","nullable":true,"metadata":{}},{"name":"col175","type":"timestamp","nullable":true,"metadata":{}},{"name":"col176","type":"timestamp","nullable":true,"metadata":{}},{"name":"col177","type":"timestamp","nullable":true,"metadata":{}},{"name":"col178","type":"timestamp","nullable":true,"metadata":{}},{"name":"col179","type":"timestamp","nullable":true,"metadata":{}},{"name":"col180","type":"timestamp","nullable":true,"metadata":{}},{"name":"col181","type":"timestamp","nullable":true,"metadata":{}},{"name":"col182","type":"timestamp","nullable":true,"metadata":{}},{"name":"col183","type":"boolean","nullable":true,"metadata":{}},{"name":"col184","type":"date","nullable":true,"metadata":{}},{"name":"scheduled_hidden_unique_ids","type":{"type":"array","elementType":"long","containsNull":true},"nullable":true,"metadata":{}},{"name":"col186","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col187","type":"long","nullable":true,"metadata":{}},{"name":"col188","type":"long","nullable":true,"metadata":{}},{"name":"col189","type":"long","nullable":true,"metadata":{}},{"name":"col190","type":"long","nullable":true,"metadata":{}},{"name":"col191","type":"long","nullable":true,"metadata":{}},{"name":"col192","type":"long","nullable":true,"metadata":{}},{"name":"col193","type":"timestamp","nullable":true,"metadata":{}},{"name":"col194","type":"timestamp","nullable":true,"metadata":{}},{"name":"col195","type":"timestamp","nullable":true,"metadata":{}},{"name":"col196","type":"timestamp","nullable":true,"metadata":{}},{"name":"col197","type":"timestamp","nullable":true,"metadata":{}},{"name":"col198","type":"timestamp","nullable":true,"metadata":{}},{"name":"col199","type":"timestamp","nullable":true,"metadata":{}},{"name":"col200","type":"timestamp","nullable":true,"metadata":{}},{"name":"col201","type":"timestamp","nullable":true,"metadata":{}},{"name":"col202","type":"timestamp","nullable":true,"metadata":{}},{"name":"col203","type":"timestamp","nullable":true,"metadata":{}},{"name":"col204","type":"timestamp","nullable":true,"metadata":{}},{"name":"col205","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col206","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col184_local","type":"date","nullable":true,"metadata":{}},{"name":"col206_internal","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col209","type":"string","nullable":true,"metadata":{}},{"name":"col210","type":{"type":"array","elementType":"integer","containsNull":true},"nullable":true,"metadata":{}},{"name":"col211","type":{"type":"array","elementType":"timestamp","containsNull":true},"nullable":true,"metadata":{}},{"name":"col212","type":"string","nullable":true,"metadata":{}},{"name":"col213","type":"integer","nullable":true,"metadata":{}},{"name":"event_millis","type":"long","nullable":true,"metadata":{}},{"name":"_hoodie_is_deleted","type":"boolean","nullable":true,"metadata":{}},{"name":"xtable_partition_col_DAY_op_date","type":"date","nullable":true,"metadata":{"delta.generationExpression":"CAST(op_date as DATE)"}}]},List(xtable_partition_col_DAY_op_date),Map(XTABLE_METADATA -> {"lastInstantSynced":"2024-07-02T17:18:13.028Z","instantsToConsiderForNextSync":[],"version":0}, delta.logRetentionDuration -> interval 168 hours),Some(1719940693028)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,0,WrappedArray(S3AFileStatus{path=s3://hidden-s3-bucket/hidden-prefix/_delta_log/00000000000000000000.json; isDirectory=false; length=75250570; replication=1; blocksize=33554432; modification_time=1719942187000; access_time=0; owner=lfm; group=lfm; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=414a8c48671a5c1a268e391487db2926-2 versionId=null),None,1719942187000), checksumOpt=None)
2024-07-02 17:43:11 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Updated snapshot to Snapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=0, metadata=Metadata(e65c4be0-428e-4ecd-8fb4-360a8fe22dda,hidden_table_name,,Format(parquet,Map()),{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"hidden_unique_id","type":"long","nullable":true,"metadata":{}},{"name":"op_date","type":"date","nullable":true,"metadata":{}},{"name":"previous_op_dates","type":{"type":"array","elementType":"date","containsNull":true},"nullable":true,"metadata":{}},{"name":"col101","type":"timestamp","nullable":true,"metadata":{}},{"name":"col102","type":"string","nullable":true,"metadata":{}},{"name":"col103","type":"boolean","nullable":true,"metadata":{}},{"name":"col104","type":"timestamp","nullable":true,"metadata":{}},{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col107","type":"string","nullable":true,"metadata":{}},{"name":"col108","type":"boolean","nullable":true,"metadata":{}},{"name":"noun_col105","type":"string","nullable":true,"metadata":{}},{"name":"col110","type":"boolean","nullable":true,"metadata":{}},{"name":"col111","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"col105","type":"string","nullable":true,"metadata":{}},{"name":"col106","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}}]},"containsNull":true},"nullable":true,"metadata":{}},{"name":"col112","type":"string","nullable":true,"metadata":{}},{"name":"col113","type":"string","nullable":true,"metadata":{}},{"name":"col114","type":"long","nullable":true,"metadata":{}},{"name":"col115","type":"string","nullable":true,"metadata":{}},{"name":"col116","type":"string","nullable":true,"metadata":{}},{"name":"col117","type":"string","nullable":true,"metadata":{}},{"name":"col118","type":"string","nullable":true,"metadata":{}},{"name":"col119","type":"string","nullable":true,"metadata":{}},{"name":"col120","type":"string","nullable":true,"metadata":{}},{"name":"col121","type":"string","nullable":true,"metadata":{}},{"name":"col122","type":"string","nullable":true,"metadata":{}},{"name":"col123","type":"timestamp","nullable":true,"metadata":{}},{"name":"col124","type":"timestamp","nullable":true,"metadata":{}},{"name":"col125","type":"timestamp","nullable":true,"metadata":{}},{"name":"col126","type":"timestamp","nullable":true,"metadata":{}},{"name":"col127","type":"timestamp","nullable":true,"metadata":{}},{"name":"col128","type":"timestamp","nullable":true,"metadata":{}},{"name":"col129","type":"timestamp","nullable":true,"metadata":{}},{"name":"col130","type":"timestamp","nullable":true,"metadata":{}},{"name":"col131","type":"timestamp","nullable":true,"metadata":{}},{"name":"col132","type":"timestamp","nullable":true,"metadata":{}},{"name":"col133","type":"timestamp","nullable":true,"metadata":{}},{"name":"col134","type":"timestamp","nullable":true,"metadata":{}},{"name":"col135","type":"timestamp","nullable":true,"metadata":{}},{"name":"col136","type":"timestamp","nullable":true,"metadata":{}},{"name":"col137","type":"timestamp","nullable":true,"metadata":{}},{"name":"col138","type":"timestamp","nullable":true,"metadata":{}},{"name":"col139","type":"timestamp","nullable":true,"metadata":{}},{"name":"col140","type":"timestamp","nullable":true,"metadata":{}},{"name":"col141","type":"timestamp","nullable":true,"metadata":{}},{"name":"col142","type":"timestamp","nullable":true,"metadata":{}},{"name":"col143","type":"timestamp","nullable":true,"metadata":{}},{"name":"col144","type":"timestamp","nullable":true,"metadata":{}},{"name":"col145","type":"timestamp","nullable":true,"metadata":{}},{"name":"col146","type":"timestamp","nullable":true,"metadata":{}},{"name":"col147","type":"timestamp","nullable":true,"metadata":{}},{"name":"col148","type":"timestamp","nullable":true,"metadata":{}},{"name":"col149","type":"timestamp","nullable":true,"metadata":{}},{"name":"col150","type":"timestamp","nullable":true,"metadata":{}},{"name":"col151","type":"string","nullable":true,"metadata":{}},{"name":"col152","type":"string","nullable":true,"metadata":{}},{"name":"col153","type":"string","nullable":true,"metadata":{}},{"name":"col154","type":"string","nullable":true,"metadata":{}},{"name":"col155","type":"string","nullable":true,"metadata":{}},{"name":"col156","type":"string","nullable":true,"metadata":{}},{"name":"col157","type":"string","nullable":true,"metadata":{}},{"name":"col158","type":"string","nullable":true,"metadata":{}},{"name":"col159","type":"string","nullable":true,"metadata":{}},{"name":"col160","type":"string","nullable":true,"metadata":{}},{"name":"col161","type":"string","nullable":true,"metadata":{}},{"name":"op_date_utc","type":"date","nullable":true,"metadata":{}},{"name":"col163","type":"timestamp","nullable":true,"metadata":{}},{"name":"col164","type":"timestamp","nullable":true,"metadata":{}},{"name":"col165","type":"timestamp","nullable":true,"metadata":{}},{"name":"col166","type":"timestamp","nullable":true,"metadata":{}},{"name":"col167","type":"timestamp","nullable":true,"metadata":{}},{"name":"col168","type":"timestamp","nullable":true,"metadata":{}},{"name":"col169","type":"timestamp","nullable":true,"metadata":{}},{"name":"col170","type":"timestamp","nullable":true,"metadata":{}},{"name":"col171","type":"timestamp","nullable":true,"metadata":{}},{"name":"col172","type":"timestamp","nullable":true,"metadata":{}},{"name":"col173","type":"timestamp","nullable":true,"metadata":{}},{"name":"col174","type":"timestamp","nullable":true,"metadata":{}},{"name":"col175","type":"timestamp","nullable":true,"metadata":{}},{"name":"col176","type":"timestamp","nullable":true,"metadata":{}},{"name":"col177","type":"timestamp","nullable":true,"metadata":{}},{"name":"col178","type":"timestamp","nullable":true,"metadata":{}},{"name":"col179","type":"timestamp","nullable":true,"metadata":{}},{"name":"col180","type":"timestamp","nullable":true,"metadata":{}},{"name":"col181","type":"timestamp","nullable":true,"metadata":{}},{"name":"col182","type":"timestamp","nullable":true,"metadata":{}},{"name":"col183","type":"boolean","nullable":true,"metadata":{}},{"name":"col184","type":"date","nullable":true,"metadata":{}},{"name":"scheduled_hidden_unique_ids","type":{"type":"array","elementType":"long","containsNull":true},"nullable":true,"metadata":{}},{"name":"col186","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col187","type":"long","nullable":true,"metadata":{}},{"name":"col188","type":"long","nullable":true,"metadata":{}},{"name":"col189","type":"long","nullable":true,"metadata":{}},{"name":"col190","type":"long","nullable":true,"metadata":{}},{"name":"col191","type":"long","nullable":true,"metadata":{}},{"name":"col192","type":"long","nullable":true,"metadata":{}},{"name":"col193","type":"timestamp","nullable":true,"metadata":{}},{"name":"col194","type":"timestamp","nullable":true,"metadata":{}},{"name":"col195","type":"timestamp","nullable":true,"metadata":{}},{"name":"col196","type":"timestamp","nullable":true,"metadata":{}},{"name":"col197","type":"timestamp","nullable":true,"metadata":{}},{"name":"col198","type":"timestamp","nullable":true,"metadata":{}},{"name":"col199","type":"timestamp","nullable":true,"metadata":{}},{"name":"col200","type":"timestamp","nullable":true,"metadata":{}},{"name":"col201","type":"timestamp","nullable":true,"metadata":{}},{"name":"col202","type":"timestamp","nullable":true,"metadata":{}},{"name":"col203","type":"timestamp","nullable":true,"metadata":{}},{"name":"col204","type":"timestamp","nullable":true,"metadata":{}},{"name":"col205","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col206","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col184_local","type":"date","nullable":true,"metadata":{}},{"name":"col206_internal","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}},{"name":"col209","type":"string","nullable":true,"metadata":{}},{"name":"col210","type":{"type":"array","elementType":"integer","containsNull":true},"nullable":true,"metadata":{}},{"name":"col211","type":{"type":"array","elementType":"timestamp","containsNull":true},"nullable":true,"metadata":{}},{"name":"col212","type":"string","nullable":true,"metadata":{}},{"name":"col213","type":"integer","nullable":true,"metadata":{}},{"name":"event_millis","type":"long","nullable":true,"metadata":{}},{"name":"_hoodie_is_deleted","type":"boolean","nullable":true,"metadata":{}},{"name":"xtable_partition_col_DAY_op_date","type":"date","nullable":true,"metadata":{"delta.generationExpression":"CAST(op_date as DATE)"}}]},List(xtable_partition_col_DAY_op_date),Map(XTABLE_METADATA -> {"lastInstantSynced":"2024-07-02T17:18:13.028Z","instantsToConsiderForNextSync":[],"version":0}, delta.logRetentionDuration -> interval 168 hours),Some(1719940693028)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,0,WrappedArray(S3AFileStatus{path=s3://hidden-s3-bucket/hidden-prefix/_delta_log/00000000000000000000.json; isDirectory=false; length=75250570; replication=1; blocksize=33554432; modification_time=1719942187000; access_time=0; owner=lfm; group=lfm; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=414a8c48671a5c1a268e391487db2926-2 versionId=null),None,1719942187000), checksumOpt=None)
2024-07-02 17:43:11 INFO  org.apache.spark.sql.delta.Snapshot:60 - [tableId=e65c4be0-428e-4ecd-8fb4-360a8fe22dda] DELTA: Compute snapshot for version: 0
2024-07-02 17:43:12 WARN  org.apache.spark.sql.catalyst.util.package:72 - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
2024-07-02 17:43:23 INFO  org.apache.spark.sql.delta.Snapshot:60 - [tableId=e65c4be0-428e-4ecd-8fb4-360a8fe22dda] DELTA: Done
2024-07-02 17:43:23 INFO  org.apache.spark.sql.delta.OptimisticTransaction:60 - [tableId=e65c4be0,txnId=b9564248] Committed delta #0 to s3://hidden-s3-bucket/hidden-prefix/_delta_log
2024-07-02 17:43:23 INFO  org.apache.xtable.conversion.ConversionController:133 - Sync is successful for the following formats DELTA

Note this is the writer's configuration:

full_cow_hudi_options = {
    "hoodie.datasource.write.hive_style_partitioning": "true",
    "hoodie.datasource.write.table.type": "COPY_ON_WRITE",
    "hoodie.datasource.write.operation": "upsert",
    "hoodie.datasource.write.partitionpath.field": "op_date",
    "hoodie.datasource.write.precombine.field": "event_millis",
    "hoodie.datasource.write.recordkey.field": "hidden_unique_id",
    "hoodie.datasource.write.payload.class": "org.apache.hudi.common.model.DefaultHoodieRecordPayload",
    "hoodie.payload.ordering.field": "event_millis",
    "hoodie.insert.shuffle.parallelism": 2,
    "hoodie.upsert.shuffle.parallelism": 2,
    "hoodie.metadata.enable": "false",
}

df.write.format("hudi").options(**full_cow_hudi_options).mode("append").save("s3://hidden-s3-bucket/hidden-prefix")

Since it's changed from the initial run, I am running xtable df515157. I'm happy to meet and/or help debug this but I'm not sure how I'd start.

vinishjail97 commented 2 months ago

@lucasmo Can you share the dump for this clean commit ? It should be something like 20240702171813028.clean in .hoodie folder.

2024-07-02 17:47:28 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240702171813028__clean__COMPLETED__20240702171815000]}

lucasmo commented 2 months ago

That clean commit no longer exists, but I ran again (with the same error).

2024-07-08 21:17:12 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240708201227546cleanCOMPLETED__20240708201230000]}

I found 20240708201227546.clean.

Output of avro-tools tojson 20240708201227546.clean | jq:

{
  "startCleanTime": "20240708201227546",
  "timeTakenInMillis": 517,
  "totalFilesDeleted": 7,
  "earliestCommitToRetain": "20240708111705091",
  "lastCompletedCommitTimestamp": "20240708201123175",
  "partitionMetadata": {
    "op_date=2024-07-03": {
      "partitionPath": "op_date=2024-07-03",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "a8214808-28b4-4c0f-8e77-193013379fd6-0_0-22-137_20240707181719248.parquet"
      ],
      "successDeleteFiles": [
        "a8214808-28b4-4c0f-8e77-193013379fd6-0_0-22-137_20240707181719248.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    },
    "op_date=2024-07-04": {
      "partitionPath": "op_date=2024-07-04",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "bb63b6ef-1a83-4a74-a736-c8418d811254-0_0-22-132_20240708081624436.parquet"
      ],
      "successDeleteFiles": [
        "bb63b6ef-1a83-4a74-a736-c8418d811254-0_0-22-132_20240708081624436.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    },
    "op_date=2024-07-05": {
      "partitionPath": "op_date=2024-07-05",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "525f1b45-7b9a-4c9e-8d83-a7cc157bb8fd-0_0-22-130_20240708091702825.parquet"
      ],
      "successDeleteFiles": [
        "525f1b45-7b9a-4c9e-8d83-a7cc157bb8fd-0_0-22-130_20240708091702825.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    },
    "op_date=2024-07-06": {
      "partitionPath": "op_date=2024-07-06",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "19976be6-94e0-4e15-a7a6-629b04c8f415-0_2-22-132_20240708091702825.parquet"
      ],
      "successDeleteFiles": [
        "19976be6-94e0-4e15-a7a6-629b04c8f415-0_2-22-132_20240708091702825.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    },
    "op_date=2024-07-07": {
      "partitionPath": "op_date=2024-07-07",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "620b8bd3-aaea-44e0-8ad7-0cfe055b58ee-0_3-22-133_20240708091702825.parquet"
      ],
      "successDeleteFiles": [
        "620b8bd3-aaea-44e0-8ad7-0cfe055b58ee-0_3-22-133_20240708091702825.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    },
    "op_date=2024-07-08": {
      "partitionPath": "op_date=2024-07-08",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "22f5d34a-48e5-44aa-bdd7-6172fc53ed82-0_4-22-134_20240708091702825.parquet"
      ],
      "successDeleteFiles": [
        "22f5d34a-48e5-44aa-bdd7-6172fc53ed82-0_4-22-134_20240708091702825.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    },
    "op_date=2024-06-26": {
      "partitionPath": "op_date=2024-06-26",
      "policy": "KEEP_LATEST_COMMITS",
      "deletePathPatterns": [
        "d777fbc1-63ef-4fde-8940-1fca04abef1e-0_5-22-147_20240707201632859.parquet"
      ],
      "successDeleteFiles": [
        "d777fbc1-63ef-4fde-8940-1fca04abef1e-0_5-22-147_20240707201632859.parquet"
      ],
      "failedDeleteFiles": [],
      "isPartitionDeleted": {
        "boolean": false
      }
    }
  },
  "version": {
    "int": 2
  },
  "bootstrapPartitionMetadata": {
    "map": {}
  }
}
lucasmo commented 2 months ago

@vinishjail97 This is really puzzling. I was able to boil it down to a very simple test case that blows up with the recursive error, which has nothing at all to do with XTable.

Using the following java dependencies, versions taken from the XTable pom

The following code triggers the recursive exception:

import org.apache.avro.Schema;
import org.apache.avro.specific.SpecificData;

Schema schema = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"HoodieCleanPartitionMetadata\",\"namespace\":\"org.apache.hudi.avro.model\",\"fields\":[{\"name\":\"partitionPath\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"policy\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"deletePathPatterns\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"successDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"failedDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"isPartitionDeleted\",\"type\":[\"null\",\"boolean\"],\"default\":null}]}");
SpecificData.getForSchema(schema);

I got the schema by running avro-tools getschema 20240708201227546.clean and then I deleted the outer HoodieCleanMetadata record. It blows up if I use the full schema, too, but this minimal example reduces it further.

Bumping the hudi-common dependency to 0.14.1 seems to have addressed the issue in the test case.

vinishjail97 commented 2 months ago

I tried re-producing the issue by running the same test as part of xtable project itself but didn't see any error, my test and schema can be found here. https://github.com/apache/incubator-xtable/pull/484

Another thing you can try in your local is whether downgrading the avro version to 1.8.2 and try parsing or running the sync again ? This is the version being used in hudi. https://github.com/apache/hudi/blob/master/pom.xml#L2301

lucasmo commented 2 months ago

@vinishjail97 I copied and pasted your test into an xtable-utilities test and I see the failure condition there: https://github.com/apache/incubator-xtable/pull/485

The same test passes in xtable-core

lucasmo commented 2 months ago

Okay, I was able to figure out why this isn't causing an error in xtable-core but it is causing one in xtable-utilities. Both have org.apache.hudi:hudi-common:0.14.0 on the classpath, whereas only xtable-core has org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0 on the classpath.

Running this sample code:

Schema schema = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"HoodieCleanPartitionMetadata\",\"namespace\":\"org.apache.hudi.avro.model\",\"fields\":[{\"name\":\"partitionPath\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"policy\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"deletePathPatterns\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"successDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"failedDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"isPartitionDeleted\",\"type\":[\"null\",\"boolean\"],\"default\":null}]}");
System.out.println("Class for schema: " + SpecificData.get().getClass(schema));

The issue is that both hudi-spark3.4-bundle_2.12:0.14.0 and hudi-common:0.14.0 have an autogenerated avro class for HoodieCleanPartitionMetadata, but they are DIFFERENT, and luckily for xtable-core the "better" library is first on the classpath.

lucasmo commented 2 months ago

@vinishjail97 can you comment on the linked issue apache/hudi#11602? They are saying you shouldn't use this jar file, but this is out of my scope to negotiate.