apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.2k stars 2.38k forks source link

[SUPPORT] ClassCastException when upsert COW table with RECORD_INDEX index type #10484

Closed lei-su-awx closed 5 months ago

lei-su-awx commented 6 months ago

I used Spark 3.4.1 and hudi 0.14.0 on GKE, streaming reading a hudi COW table(on GCS) and write to another hudi COW table(on GCS) with upsert(RECORD_INDEX), here is my write option: write_streaming_hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': f'{primary_keys}, region', 'hoodie.datasource.write.precombine.field': precombine_field, 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.partitionpath.field': 'date', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.reconcile.schema': 'true', 'hoodie.schema.on.read.enable': 'true', 'hoodie.parquet.compression.codec': 'snappy', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.write.drop.partition.columns': 'true', 'hoodie.insert.shuffle.parallelism': '1000', 'hoodie.upsert.shuffle.parallelism': '1000', 'hoodie.datasource.write.payload.class': 'org.apache.hudi.common.model.DefaultHoodieRecordPayload', 'hoodie.metadata.record.index.enable': 'true', 'hoodie.index.type': 'RECORD_INDEX', 'hoodie.metadata.record.index.min.filegroup.count': '100', 'hoodie.metadata.optimized.log.blocks.scan.enable': 'true' }

spark hudi jar is: hudi-spark3-bundle_2.12-0.14.0.jar

The steps that produces the exception are:

  1. start the streaming pipeline with above option
  2. write the first micro-batch to GCS successfully
  3. second micro-batch throws an error

stack trace is:

image

So I have two question:

  1. why this error happened?
  2. why the first micro-batch always succeed?

Environment Description

danny0405 commented 5 months ago

It looks like an avro version conflict, hudi avro has a org.apache.hudi prefix name for shading.

ad1happy2go commented 5 months ago

@lei-su-awx Were you able to resolve this issue or still facing the same? Please let us know in case you need help on this.

lei-su-awx commented 5 months ago

@ad1happy2go issue solved, java version conflict, should use java8