apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
853 stars 140 forks source link

Docker Demo fails at Iceberg sync to Delta #392

Closed sagarlakshmipathy closed 1 week ago

sagarlakshmipathy commented 6 months ago

code block

val icebergSourceClientProvider = new IcebergSourceClientProvider()
icebergSourceClientProvider.init(spark.sparkContext.hadoopConfiguration, Collections.emptyMap())
val icebergSourcePerTableConfig = PerTableConfigImpl.builder()
    .tableName(hudiTableName)
    .namespace(namespaceArray)
    .targetTableFormats(Arrays.asList(TableFormat.DELTA))
    .tableBasePath(hudiBasePath)
    .icebergCatalogConfig(icebergCatalogConfig)
    .syncMode(SyncMode.INCREMENTAL)
    .build()
oneTableClient.sync(icebergSourcePerTableConfig, icebergSourceClientProvider)

error:

java.lang.NoSuchMethodError: org.apache.spark.sql.delta.actions.AddFile.<init>(Ljava/lang/String;Lscala/collection/immutable/Map;JJZLjava/lang/String;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/delta/actions/DeletionVectorDescriptor;)V
  org.apache.xtable.delta.DeltaDataFileUpdatesExtractor.createAddFileAction(DeltaDataFileUpdatesExtractor.java:118)
  org.apache.xtable.delta.DeltaDataFileUpdatesExtractor.lambda$applyDiff$3(DeltaDataFileUpdatesExtractor.java:99)
  java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
  java.util.Iterator.forEachRemaining(Iterator.java:116)
  java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
  java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
  java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
  java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
  java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
  java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
  java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
  java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
  java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
  java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
  org.apache.xtable.delta.DeltaDataFileUpdatesExtractor.applyDiff(DeltaDataFileUpdatesExtractor.java:103)
  org.apache.xtable.delta.DeltaDataFileUpdatesExtractor.applySnapshot(DeltaDataFileUpdatesExtractor.java:78)
  org.apache.xtable.delta.DeltaClient.syncFilesForSnapshot(DeltaClient.java:184)
  org.apache.xtable.spi.sync.TableFormatSync.lambda$syncSnapshot$0(TableFormatSync.java:74)
  org.apache.xtable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:160)
  org.apache.xtable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70)
  org.apache.xtable.client.OneTableClient.syncSnapshot(OneTableClient.java:179)
  org.apache.xtable.client.OneTableClient.sync(OneTableClient.java:116)
  ammonite.$sess.cell7$Helper.<init>(cell7.sc:11)
  ammonite.$sess.cell7$.<init>(cell7.sc:7)
  ammonite.$sess.cell7$.<clinit>(cell7.sc:-1)
kywe665 commented 6 months ago

+1 I also hit this

ashvina commented 6 months ago

The issue might stem from the Delta version upgrade, which required an update to the Spark version as well (see commit). It seems that the demo code wasn't revised to reflect these changes at that time.

sagarlakshmipathy commented 6 months ago

thats what i thought too, let me fix it this weekend.

sagarlakshmipathy commented 6 months ago

can you assign it to me @ashvina

ashvina commented 6 months ago

Thanks @sagarlakshmipathy References to AddFile were fixed in the commit I mentioned above. Those changes may provide some hints about how to fix the demo.

zhen-d commented 3 weeks ago

can I fix this issue? @ashvina , I updated the notebook jar version locally and it works. no outputs will be included, only the version changed.

the-other-tim-brown commented 2 weeks ago

@zhen-d can you raise a PR?