apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.32k stars 2.41k forks source link

[SUPPORT] Flink job failing with Avro ClassCastException #9596

Open raghunittala opened 1 year ago

raghunittala commented 1 year ago

Hi Team,

I have a Flink job where I'm trying to consume Protobuf messages from Kafka and save them to Hudi table in S3 object storage. Here are few issues I'm facing while trying to do so:

  1. The job runs for sometime and when it tries to compact the files it is throwing a ClassCastException. Here is the complete stacktrace for that:
    Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20230824070324708] error
    ... 6 more
    Caused by: java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class org.apache.avro.specific.SpecificRecordBase (org.apache.avro.generic.GenericData$Record and org.apache.avro.specific.SpecificRecordBase are in unnamed module of loader 'app')
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209)
    at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeCompactionPlan(TimelineMetadataUtils.java:169)
    at org.apache.hudi.common.util.CompactionUtils.getCompactionPlan(CompactionUtils.java:191)
    at org.apache.hudi.common.util.CompactionUtils.lambda$getCompactionPlansByTimeline$4(CompactionUtils.java:163)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
    at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
    at org.apache.hudi.common.util.CompactionUtils.getCompactionPlansByTimeline(CompactionUtils.java:164)
    at org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionPlans(CompactionUtils.java:133)
    at org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.java:207)
    at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:120)
    at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:113)
    at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:107)
    at org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:177)
    at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$5fcdabfe$1(FileSystemViewManager.java:272)
    at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$getFileSystemView$1(FileSystemViewManager.java:115)
    at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
    at org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:114)
    at org.apache.hudi.table.HoodieTable.getSliceView(HoodieTable.java:320)
    at org.apache.hudi.table.action.compact.plan.generators.BaseHoodieCompactionPlanGenerator.generateCompactionPlan(BaseHoodieCompactionPlanGenerator.java:92)
    at org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.scheduleCompaction(ScheduleCompactionActionExecutor.java:147)
    at org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.execute(ScheduleCompactionActionExecutor.java:113)
    at org.apache.hudi.table.HoodieFlinkMergeOnReadTable.scheduleCompaction(HoodieFlinkMergeOnReadTable.java:105)
    at org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:421)
    at org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableService(BaseHoodieTableServiceClient.java:393)
    at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService(BaseHoodieWriteClient.java:1097)
    at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant(BaseHoodieWriteClient.java:876)
    at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompaction(BaseHoodieWriteClient.java:867)
    at org.apache.hudi.util.CompactionUtil.scheduleCompaction(CompactionUtil.java:65)
    at org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:250)
    at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)

    I'm also copying hudi-flink1.16-bundle under flink/plugins/hudi folder.

I’m not seeing any parquet files being created in S3. I only see .log files being created. I suspect that due to the above exception compaction couldn't run and so parquets are not created.

Here is the hoodie.properties:

hoodie.table.type=MERGE_ON_READ
hoodie.table.precombine.field=start_time_unix_nano
hoodie.table.partition.fields=event_id,event_type
hoodie.table.cdc.enabled=false
hoodie.archivelog.folder=archived
hoodie.timeline.layout.version=1
hoodie.table.checksum=3942898242
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.recordkey.fields=record_key
hoodie.table.name=ad_results_sink_table
hoodie.compaction.record.merger.strategy=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5
hoodie.datasource.write.hive_style_partitioning=false
hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexAvroKeyGenerator
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.version=5
danny0405 commented 1 year ago

Find some clues here: https://stackoverflow.com/questions/70919159/kafka-classcastexception-class-org-apache-avro-generic-genericdatarecord-can

raghunittala commented 1 year ago

Hi @danny0405 - We're not deserializing the payload. We are consuming a protobuf format from Kafka and writing to Hudi table as it is. We do not have any transformations in our pipeline.

When the downstream operators consume the parquet files from S3 object storage, the payload would be deserialized and modified accordingly.

danny0405 commented 1 year ago

@raghunittala Sorry I'm not a Avro expert, kind of think it is a avro version conflict issue.

raghunittala commented 1 year ago

No worries @danny0405 - I'll try to spend more time on this and update in case of success.

yihua commented 1 year ago

@raghunittala Based on the stacktrace the compaction plan on the Hudi timeline cannot be deserialized. Likely, it's an issue of Avro version mismatch. How do you run the job and what are the jars/dependencies on the classpath?

yihua commented 1 year ago

If you can upload the file containing the compaction plan (i.e., .hoodie/<instant>.compaction.requested), we can check if it's corrupted.

raghunittala commented 1 year ago

Hi @yihua - Thanks for your response. I'm using avro 1.11.1. Here are the libraries, JobManager has listed in the classpath:

/opt/flink/lib/avro-1.11.1.jar
/opt/flink/lib/aws-java-sdk-core-1.12.341.jar
/opt/flink/lib/aws-java-sdk-kms-1.12.341.jar
/opt/flink/lib/aws-java-sdk-s3-1.12.341.jar
/opt/flink/lib/elephant-bird-core-4.17.jar
/opt/flink/lib/flink-cep-1.17.1.jar
/opt/flink/lib/flink-connector-files-1.17.1.jar
/opt/flink/lib/flink-csv-1.17.1.jar
/opt/flink/lib/flink-hadoop-compatibility_2.12-1.17.1.jar
/opt/flink/lib/flink-json-1.17.1.jar
/opt/flink/lib/flink-parquet-1.17.1.jar
/opt/flink/lib/flink-protobuf-1.17.1.jar
/opt/flink/lib/flink-scala_2.12-1.17.1.jar
/opt/flink/lib/flink-sql-parquet-1.17.1.jar
/opt/flink/lib/flink-table-api-java-bridge-1.17.1.jar
/opt/flink/lib/flink-table-api-java-uber-1.17.1.jar
/opt/flink/lib/flink-table-planner-loader-1.17.1.jar
/opt/flink/lib/flink-table-runtime-1.17.1.jar
/opt/flink/lib/guava-32.1.1-jre.jar
/opt/flink/lib/hadoop-aws-3.3.5.jar
/opt/flink/lib/hadoop-common-3.3.5.jar
/opt/flink/lib/hadoop-hdfs-3.3.5.jar
/opt/flink/lib/hadoop-mapreduce-client-core-3.3.5.jar
/opt/flink/lib/hadoop-shaded-guava-1.1.1.jar
/opt/flink/lib/jackson-annotations-2.15.2.jar
/opt/flink/lib/jackson-core-2.15.2.jar
/opt/flink/lib/jackson-databind-2.15.2.jar
/opt/flink/lib/jackson-module-kotlin-2.15.2.jar
/opt/flink/lib/javalin-4.6.8.jar
/opt/flink/lib/joda-time-2.12.5.jar
/opt/flink/lib/kotlin-reflect-1.8.22.jar
/opt/flink/lib/kotlin-stdlib-1.8.22.jar
/opt/flink/lib/log4j-1.2-api-2.17.1.jar
/opt/flink/lib/log4j-api-2.17.1.jar
/opt/flink/lib/log4j-core-2.17.1.jar
/opt/flink/lib/log4j-slf4j-impl-2.17.1.jar
/opt/flink/lib/parquet-hadoop-1.13.1.jar
/opt/flink/lib/parquet-protobuf-1.13.1.jar
/opt/flink/lib/protobuf-java-3.22.3.jar
/opt/flink/lib/stax2-api-4.2.1.jar
/opt/flink/lib/woodstox-core-6.4.0.jar
/opt/flink/lib/flink-dist-1.17.1.jar
 
/opt/hadoop/etc/hadoop/
/opt/hadoop/share/hadoop/common/lib/accessors-smart-2.4.7.jar
/opt/hadoop/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar
/opt/hadoop/share/hadoop/common/lib/asm-5.0.4.jar
/opt/hadoop/share/hadoop/common/lib/audience-annotations-0.5.0.jar
/opt/hadoop/share/hadoop/common/lib/avro-1.7.7.jar
/opt/hadoop/share/hadoop/common/lib/checker-qual-2.5.2.jar
/opt/hadoop/share/hadoop/common/lib/commons-beanutils-1.9.4.jar
/opt/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar
/opt/hadoop/share/hadoop/common/lib/commons-codec-1.15.jar
/opt/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar
/opt/hadoop/share/hadoop/common/lib/commons-compress-1.21.jar
/opt/hadoop/share/hadoop/common/lib/commons-configuration2-2.8.0.jar
/opt/hadoop/share/hadoop/common/lib/commons-daemon-1.0.13.jar
/opt/hadoop/share/hadoop/common/lib/commons-io-2.8.0.jar
/opt/hadoop/share/hadoop/common/lib/commons-lang3-3.12.0.jar
/opt/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar
/opt/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar
/opt/hadoop/share/hadoop/common/lib/commons-net-3.9.0.jar
/opt/hadoop/share/hadoop/common/lib/commons-text-1.10.0.jar
/opt/hadoop/share/hadoop/common/lib/curator-client-4.2.0.jar
/opt/hadoop/share/hadoop/common/lib/curator-framework-4.2.0.jar
/opt/hadoop/share/hadoop/common/lib/curator-recipes-4.2.0.jar
/opt/hadoop/share/hadoop/common/lib/dnsjava-2.1.7.jar
/opt/hadoop/share/hadoop/common/lib/failureaccess-1.0.jar
/opt/hadoop/share/hadoop/common/lib/gson-2.9.0.jar
/opt/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-annotations-3.3.5.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-auth-3.3.5.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar
/opt/hadoop/share/hadoop/common/lib/httpclient-4.5.13.jar
/opt/hadoop/share/hadoop/common/lib/httpcore-4.4.13.jar
/opt/hadoop/share/hadoop/common/lib/j2objc-annotations-1.1.jar
/opt/hadoop/share/hadoop/common/lib/jackson-annotations-2.12.7.jar
/opt/hadoop/share/hadoop/common/lib/jackson-core-2.12.7.jar
/opt/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar
/opt/hadoop/share/hadoop/common/lib/jackson-databind-2.12.7.1.jar
/opt/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar
/opt/hadoop/share/hadoop/common/lib/jakarta.activation-api-1.2.1.jar
/opt/hadoop/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar
/opt/hadoop/share/hadoop/common/lib/jaxb-api-2.2.11.jar
/opt/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar
/opt/hadoop/share/hadoop/common/lib/jcip-annotations-1.0-1.jar
/opt/hadoop/share/hadoop/common/lib/jersey-core-1.19.4.jar
/opt/hadoop/share/hadoop/common/lib/jersey-json-1.20.jar
/opt/hadoop/share/hadoop/common/lib/jersey-server-1.19.4.jar
/opt/hadoop/share/hadoop/common/lib/jersey-servlet-1.19.4.jar
/opt/hadoop/share/hadoop/common/lib/jettison-1.5.3.jar
/opt/hadoop/share/hadoop/common/lib/jetty-http-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-io-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-security-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-server-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-servlet-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-util-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-util-ajax-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-webapp-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jetty-xml-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/common/lib/jsch-0.1.55.jar
/opt/hadoop/share/hadoop/common/lib/json-smart-2.4.7.jar
/opt/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar
/opt/hadoop/share/hadoop/common/lib/jsr305-3.0.2.jar
/opt/hadoop/share/hadoop/common/lib/jsr311-api-1.1.1.jar
/opt/hadoop/share/hadoop/common/lib/jul-to-slf4j-1.7.36.jar
/opt/hadoop/share/hadoop/common/lib/kerb-admin-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-client-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-common-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-core-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-crypto-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-identity-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-server-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerb-util-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerby-asn1-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerby-config-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerby-pkix-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerby-util-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/kerby-xdr-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
/opt/hadoop/share/hadoop/common/lib/metrics-core-3.2.4.jar
/opt/hadoop/share/hadoop/common/lib/netty-all-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-buffer-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-dns-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-haproxy-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-http-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-http2-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-memcache-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-mqtt-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-redis-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-smtp-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-socks-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-stomp-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-codec-xml-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-common-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-handler-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-handler-proxy-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/re2j-1.1.jar
/opt/hadoop/share/hadoop/common/lib/netty-resolver-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-resolver-dns-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-resolver-dns-classes-macos-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-resolver-dns-native-macos-4.1.77.Final-osx-aarch_64.jar
/opt/hadoop/share/hadoop/common/lib/netty-resolver-dns-native-macos-4.1.77.Final-osx-x86_64.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-classes-epoll-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-classes-kqueue-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-native-epoll-4.1.77.Final-linux-aarch_64.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-native-epoll-4.1.77.Final-linux-x86_64.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-native-kqueue-4.1.77.Final-osx-aarch_64.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-native-kqueue-4.1.77.Final-osx-x86_64.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-native-unix-common-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-rxtx-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-sctp-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/netty-transport-udt-4.1.77.Final.jar
/opt/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-9.8.1.jar
/opt/hadoop/share/hadoop/common/lib/paranamer-2.3.jar
/opt/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar
/opt/hadoop/share/hadoop/common/lib/reload4j-1.2.22.jar
/opt/hadoop/share/hadoop/common/lib/slf4j-api-1.7.36.jar
/opt/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar
/opt/hadoop/share/hadoop/common/lib/snappy-java-1.1.8.2.jar
/opt/hadoop/share/hadoop/common/lib/stax2-api-4.2.1.jar
/opt/hadoop/share/hadoop/common/lib/token-provider-1.0.1.jar
/opt/hadoop/share/hadoop/common/lib/woodstox-core-5.4.0.jar
/opt/hadoop/share/hadoop/common/lib/zookeeper-3.5.6.jar
/opt/hadoop/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar
/opt/hadoop/share/hadoop/common/lib/aws-java-sdk-bundle-1.12.316.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-aws-3.3.5.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-common-3.3.5-tests.jar
/opt/hadoop/share/hadoop/common/lib/hadoop-common-3.3.5.jar
/opt/hadoop/share/hadoop/common/hadoop-common-3.3.5-tests.jar
/opt/hadoop/share/hadoop/common/hadoop-common-3.3.5.jar
/opt/hadoop/share/hadoop/common/hadoop-kms-3.3.5.jar
/opt/hadoop/share/hadoop/common/hadoop-nfs-3.3.5.jar
/opt/hadoop/share/hadoop/common/hadoop-registry-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.5-tests.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-uploader-3.3.5.jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar
/opt/hadoop/share/hadoop/yarn
/opt/hadoop/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar
/opt/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar
/opt/hadoop/share/hadoop/yarn/lib/asm-analysis-9.3.jar
/opt/hadoop/share/hadoop/yarn/lib/asm-commons-9.3.jar
/opt/hadoop/share/hadoop/yarn/lib/asm-tree-9.3.jar
/opt/hadoop/share/hadoop/yarn/lib/bcpkix-jdk15on-1.68.jar
/opt/hadoop/share/hadoop/yarn/lib/bcprov-jdk15on-1.68.jar
/opt/hadoop/share/hadoop/yarn/lib/ehcache-3.3.1.jar
/opt/hadoop/share/hadoop/yarn/lib/fst-2.50.jar
/opt/hadoop/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar
/opt/hadoop/share/hadoop/yarn/lib/guice-4.0.jar
/opt/hadoop/share/hadoop/yarn/lib/guice-servlet-4.0.jar
/opt/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-base-2.12.7.jar
/opt/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.12.7.jar
/opt/hadoop/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.12.7.jar
/opt/hadoop/share/hadoop/yarn/lib/jakarta.xml.bind-api-2.3.2.jar
/opt/hadoop/share/hadoop/yarn/lib/java-util-1.9.0.jar
/opt/hadoop/share/hadoop/yarn/lib/javax-websocket-client-impl-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/javax-websocket-server-impl-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar
/opt/hadoop/share/hadoop/yarn/lib/javax.websocket-api-1.0.jar
/opt/hadoop/share/hadoop/yarn/lib/javax.websocket-client-api-1.0.jar
/opt/hadoop/share/hadoop/yarn/lib/jersey-client-1.19.4.jar
/opt/hadoop/share/hadoop/yarn/lib/jersey-guice-1.19.4.jar
/opt/hadoop/share/hadoop/yarn/lib/jetty-annotations-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/jetty-client-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/jetty-jndi-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/jetty-plus-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/jline-3.9.0.jar
/opt/hadoop/share/hadoop/yarn/lib/jna-5.2.0.jar
/opt/hadoop/share/hadoop/yarn/lib/json-io-2.5.1.jar
/opt/hadoop/share/hadoop/yarn/lib/metrics-core-3.2.4.jar
/opt/hadoop/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar
/opt/hadoop/share/hadoop/yarn/lib/objenesis-2.6.jar
/opt/hadoop/share/hadoop/yarn/lib/snakeyaml-1.32.jar
/opt/hadoop/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar
/opt/hadoop/share/hadoop/yarn/lib/websocket-api-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/websocket-client-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/websocket-common-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/websocket-server-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/lib/websocket-servlet-9.4.48.v20220622.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-api-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-mawo-core-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-client-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-registry-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-router-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-services-api-3.3.5.jar
/opt/hadoop/share/hadoop/yarn/hadoop-yarn-services-core-3.3.5.jar
/opt/hadoop/etc/hadoop/

Also, attaching the .hoodie/<instant>.compaction.requested file as you asked for. compaction_requested_file.zip

ligou525 commented 8 months ago

Hi @raghunittala, Do you find a solution for this problem? I faced the same issue when call the insertOverwrite api: Caused by: org.apache.hudi.exception.HoodieException: Error getting all file groups in pending clustering at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:135) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:113) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.(HoodieTableFileSystemView.java:102) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.(HoodieTableFileSystemView.java:93) at org.apache.hudi.metadata.HoodieMetadataFileSystemView.(HoodieMetadataFileSystemView.java:44) at org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:166) at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$5fcdabfe$1(FileSystemViewManager.java:259) at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$getFileSystemView$1(FileSystemViewManager.java:111) at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) at org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:110) at org.apache.hudi.table.HoodieTable.getSliceView(HoodieTable.java:303) at org.apache.hudi.table.action.commit.JavaInsertOverwriteCommitActionExecutor.getAllExistingFileIds(JavaInsertOverwriteCommitActionExecutor.java:77) at org.apache.hudi.table.action.commit.JavaInsertOverwriteCommitActionExecutor.lambda$getPartitionToReplacedFileIds$823fa0f9$1(JavaInsertOverwriteCommitActionExecutor.java:71) at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapToPairWrapper$3(FunctionWrapper.java:68) ... 28 common frames omitted Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.specific.SpecificRecordBase at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:206) at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeRequestedReplaceMetadata(TimelineMetadataUtils.java:186) at org.apache.hudi.common.util.ClusteringUtils.getRequestedReplaceMetadata(ClusteringUtils.java:95) at org.apache.hudi.common.util.ClusteringUtils.getClusteringPlan(ClusteringUtils.java:106) at org.apache.hudi.common.util.ClusteringUtils.lambda$getAllPendingClusteringPlans$0(ClusteringUtils.java:69) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:129) ... 42 common frames omitted

raghunittala commented 8 months ago

No, I wasn't able to fix this. I updated to Hudi 0.14, still I can see this error.