apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.23k stars 2.39k forks source link

[SUPPORT] BigQuery Sync error with --use-bq-manifest-file option when syncing Hudi table to BigLake table. #9739

Closed thanhnd20 closed 9 months ago

thanhnd20 commented 10 months ago

Describe the problem you faced

I have an issue when trying to submit Spark Job to sync Hudi table to BigLake table with --use-bq-manifest-file option I've built hudi-gcp-bundle-0.14.0-rc1(rc2).jar JAR file to run spark-submit. The error is shown with Exception: Exception in thread "main" java.lang.NoSuchMethodError. The ConfigProperty Class still has markAdvanced() method.

Expected behavior

A clear and concise description of what you expected to happen. The spark-submit job will run successfully without error with --use-bq-manifest-file option and generate a manifest file with the BigLake table in GCP.

Environment Description

Additional context

Stacktrace Exception in thread "main" java.lang.NoSuchMethodError: 'org.apache.hudi.common.config.ConfigProperty org.apache.hudi.common.config.ConfigProperty.markAdvanced()' at org.apache.hudi.gcp.bigquery.BigQuerySyncConfig.<clinit>(BigQuerySyncConfig.java:57) at org.apache.hudi.gcp.bigquery.BigQuerySyncConfig$BigQuerySyncConfigParams.toProps(BigQuerySyncConfig.java:163) at org.apache.hudi.gcp.bigquery.BigQuerySyncTool.main(BigQuerySyncTool.java:157) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:973) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1061) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1070) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

ad1happy2go commented 10 months ago

@thanhnd20 Can you please share your mvn command you have used to build the jar. Also share your complete command/code which you are using. By error I can see that the gcp bundle jar may be conflicting with hudi bundle.

thanhnd20 commented 10 months ago

@ad1happy2go The mvn command I have used to build Jar bundle packages: mvn -am -pl packing/gcp-hudi-bundle clean package DskipTest=true. The complete command I have used to run spark-submit: spark-submit \ --master yarn \ --packages com.google.cloud:google-cloud-bigquery:2.10.4 \ --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool \ gs://<my_gs_folder>/hudi-gcp-bundle-0.14.0-rc1.jar \ --project-id <my_project_id> \ --dataset-name <my_datasetname> \ --dataset-location asia-southeast1 \ --source-uri gs://<my_gs_folder>/hudi_trips_cow/asia=* \ --source-uri-prefix gs://<my_gs_folder>/hudi_trips_cow/ \ --use-bq-manifest-file true If you need me to provide any more information, please let me know.

ad1happy2go commented 10 months ago

It looks okay to me. cc @the-other-tim-brown in case have any idea.

Also check if you have any other Hudi version (may be coming default with data proc)

the-other-tim-brown commented 10 months ago

The markAdvanced is a recent change so I had some thoughts about whether there is another jar included on the spark cluster with an older hudi version.

nfarah86 commented 9 months ago

@thanhnd20 can you reply on the ticket about whether you are using other Hudi versions on your class path? cc @the-other-tim-brown @ad1happy2go -

following up on the slack thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1695734405920159?thread_ts=1694417492.831359&cid=C4D716NPQ

ad1happy2go commented 9 months ago

@thanhnd20 was able to fix this error by upgrading the Spark Hudi version in Spark Cluster. Thanks for the all the support from everyone.

rsnexgt commented 4 months ago

@ad1happy2go Could please provide the details on the Spark Hudi version that you have upgraded to fix the issue ?