Open sshetty007 opened 2 years ago
The issue appears to be missing jars in the share lib under /user/oozie/share/lib/spark
.
Copying the following jars manually allowed the job execution to complete successfully
hadoop fs -put hadoop-common-3.2.2.jar /user/oozie/share/lib/spark
hadoop fs -put hive/woodstox-core-5.0.3.jar /user/oozie/share/lib/spark
hadoop fs -put hive/stax-api-1.0.1.jar /user/oozie/share/lib/spark
hadoop fs -put hive/stax2-api-3.1.4.jar /user/oozie/share/lib/spark
hadoop fs -put hive/hive/commons-collections4-4.1.jar /user/oozie/share/lib/spark
hadoop fs -put hive/commons-collections4-4.1.jar /user/oozie/share/lib/spark
hadoop fs -put hive/hive/commons-collections-3.2.2.jar /user/oozie/share/lib/spark
hadoop fs -put hive/commons-collections-3.2.2.jar /user/oozie/share/lib/spark
hadoop fs -put hive/commons-*.jar /user/oozie/share/lib/spark
hadoop fs -put hive/htrace-core4-4.1.0-incubating.jar /user/oozie/share/lib/spark
hadoop fs -put hive/hadoop*.jar /user/oozie/share/lib/spark
hadoop fs -put /user/oozie/share/lib/spark
hadoop fs -put /usr/lib/spark/jars/hadoop*.jar /user/oozie/share/lib/spark
hadoop fs -put ./spark/jars/spark-hadoop-cloud_2.12-3.1.2.jar /user/oozie/share/lib/spark
hadoop fs -put ./spark/jars/hadoop-cloud-storage-3.2.2.jar /user/oozie/share/lib/spark
hadoop fs -put /usr/local/share/google/dataproc/lib/gcs-connector.jar /user/oozie/share/lib/spark
hadoop fs -put /usr/lib/spark/jars/re2j-1.1.jar /user/oozie/share/lib/spark
Hi @sshetty007, I am also facing the same issue in dataproc cluster, is the issue resolved for you, If so, please guide me with your inputs
Steps to repro.
Create a data proc HA cluster using the command below
Submit a spark jobs from oozie workflow with the master as yarn and mode as cluster and getting an error. I used the Spark example included in the
oozie-examples
. Ensure in thejob.properties
the following are being usedAlso place the
workflow.xml
and jar in a GCS bucket and refer in thejob.properties
andworkflow.xml
with thegs://
prefix.Submit the spark job. The job fails with the following error is reported in the YARN log.
The same works when using the following settings
Possibly the
CLASSPATH
is not set correctly on the worker nodes.