Closed lei-su-awx closed 5 months ago
It looks like an avro version conflict, hudi avro has a org.apache.hudi
prefix name for shading.
@lei-su-awx Were you able to resolve this issue or still facing the same? Please let us know in case you need help on this.
@ad1happy2go issue solved, java version conflict, should use java8
I used Spark 3.4.1 and hudi 0.14.0 on GKE, streaming reading a hudi COW table(on GCS) and write to another hudi COW table(on GCS) with upsert(RECORD_INDEX), here is my write option: write_streaming_hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': f'{primary_keys}, region', 'hoodie.datasource.write.precombine.field': precombine_field, 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.partitionpath.field': 'date', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.reconcile.schema': 'true', 'hoodie.schema.on.read.enable': 'true', 'hoodie.parquet.compression.codec': 'snappy', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.write.drop.partition.columns': 'true', 'hoodie.insert.shuffle.parallelism': '1000', 'hoodie.upsert.shuffle.parallelism': '1000', 'hoodie.datasource.write.payload.class': 'org.apache.hudi.common.model.DefaultHoodieRecordPayload', 'hoodie.metadata.record.index.enable': 'true', 'hoodie.index.type': 'RECORD_INDEX', 'hoodie.metadata.record.index.min.filegroup.count': '100', 'hoodie.metadata.optimized.log.blocks.scan.enable': 'true' }
spark hudi jar is:
hudi-spark3-bundle_2.12-0.14.0.jar
The steps that produces the exception are:
stack trace is:
So I have two question:
Environment Description
Hudi version : 0.14.0
Spark version : 3.4.1
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : GCS
Running on Docker? (yes/no) : on GKE, base image is spark-3.4.1 Base image use java11, java8 may solve this?