GoogleCloudDataproc / initialization-actions

Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
https://cloud.google.com/dataproc/init-actions
Apache License 2.0
587 stars 511 forks source link

`ClassNotFoundException` when trying to submit a Spark job in an HA Data Proc cluster #941

Open sshetty007 opened 2 years ago

sshetty007 commented 2 years ago

Steps to repro.

  1. Create a data proc HA cluster using the command below

    gcloud dataproc clusters create ${CLUSTER_NAME} \
    --region us-central1 --zone us-central1-a \
    --image-version 2.0-ubuntu18 \
    --num-masters 3 --master-machine-type n1-standard-4 --master-boot-disk-size 1000 \
    --num-workers 2 --worker-machine-type n1-standard-4 \
    --enable-component-gateway \
    --initialization-actions gs://goog-dataproc-initialization-actions-us-central1/oozie/oozie.sh \
    --properties=dataproc:dataproc.logging.stackdriver.job.driver.enable=true,dataproc:dataproc.logging.stackdriver.job.driver.enable=true,dataproc:dataproc.logging.stackdriver.job.yarn.container.enable=true,dataproc:dataproc.logging.stackdriver.enable=true,dataproc:jobs.file-backed-output.enable=true
  2. Submit a spark jobs from oozie workflow with the master as yarn and mode as cluster and getting an error. I used the Spark example included in the oozie-examples. Ensure in the job.properties the following are being used

    master=yarn
    mode=cluster
  3. Also place the workflow.xml and jar in a GCS bucket and refer in the job.properties and workflow.xml with the gs:// prefix.

  4. Submit the spark job. The job fails with the following error is reported in the YARN log.

error:
Application application_1635196402362_0017 failed 2 times due to AM Container for appattempt_1635196402362_0017_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2021-10-26 06:40:13.217]Exception from container-launch.
Container id: container_e03_1635196402362_0017_02_000001
Exit code: 1
[2021-10-26 06:40:13.220]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:650)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:632)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more

The same works when using the following settings

master=local[*]
mode=client

Possibly the CLASSPATH is not set correctly on the worker nodes.

sshetty007 commented 2 years ago

The issue appears to be missing jars in the share lib under /user/oozie/share/lib/spark.

Copying the following jars manually allowed the job execution to complete successfully

 hadoop fs -put hadoop-common-3.2.2.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/woodstox-core-5.0.3.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/stax-api-1.0.1.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/stax2-api-3.1.4.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/hive/commons-collections4-4.1.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/commons-collections4-4.1.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/hive/commons-collections-3.2.2.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/commons-collections-3.2.2.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/commons-*.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/htrace-core4-4.1.0-incubating.jar /user/oozie/share/lib/spark
 hadoop fs -put hive/hadoop*.jar /user/oozie/share/lib/spark
 hadoop fs -put /user/oozie/share/lib/spark
 hadoop fs -put /usr/lib/spark/jars/hadoop*.jar /user/oozie/share/lib/spark
 hadoop fs -put ./spark/jars/spark-hadoop-cloud_2.12-3.1.2.jar /user/oozie/share/lib/spark
 hadoop fs -put ./spark/jars/hadoop-cloud-storage-3.2.2.jar /user/oozie/share/lib/spark
 hadoop fs -put /usr/local/share/google/dataproc/lib/gcs-connector.jar /user/oozie/share/lib/spark
 hadoop fs -put /usr/lib/spark/jars/re2j-1.1.jar /user/oozie/share/lib/spark
ajaymj0609 commented 2 years ago

Hi @sshetty007, I am also facing the same issue in dataproc cluster, is the issue resolved for you, If so, please guide me with your inputs