azkaban / azkaban-plugins

Plugins for Azkaban.
https://azkaban.github.io
Apache License 2.0
130 stars 178 forks source link

azkaban-plugins does not recognize the directory of HDFS-version-azkaban-3.90.0 #301

Open datagic opened 3 years ago

datagic commented 3 years ago

Hello, I use the Azkaban plugin to perform the task of spark on yarn, And I configured the spark plugin and introduced all the environment information and dependencies,But still report an error. Azkaban can't find my HDFS directory correctly. It prefixes my HDFS directory with my Azkaban installation directory. Hope to get your help! Thank you very much!

The error report is as follows: java.lang.Exception: Job set up failed: execution jar is suppose to be in this folder, but the folder doesn't exist: /apps/azkaban/exec-server/bin/executions/10/hdfs:///spark-jar/test at azkaban.jobExecutor.ProcessJob.handleError(ProcessJob.java:434) at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:208) at azkaban.jobtype.AbstractHadoopJavaProcessJob.run(AbstractHadoopJavaProcessJob.java:50) at azkaban.jobtype.HadoopSparkJob.run(HadoopSparkJob.java:307) at azkaban.execapp.JobRunner.runJob(JobRunner.java:830) at azkaban.execapp.JobRunner.doRun(JobRunner.java:607) at azkaban.execapp.JobRunner.run(JobRunner.java:568) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: execution jar is suppose to be in this folder, but the folder doesn't exist: /apps/azkaban/exec-server/bin/executions/10/hdfs:///spark-jar/test at azkaban.jobtype.HadoopJobUtils.resolveExecutionJarName(HadoopJobUtils.java:319) at azkaban.jobtype.HadoopSparkJob.executionJarHelper(HadoopSparkJob.java:219) at azkaban.jobtype.HadoopSparkJob.testableGetMainArguments(HadoopSparkJob.java:193) at azkaban.jobtype.HadoopSparkJob.getMainArguments(HadoopSparkJob.java:406) at azkaban.jobExecutor.JavaProcessJob.createCommandLine(JavaProcessJob.java:73) at azkaban.jobExecutor.JavaProcessJob.getCommandList(JavaProcessJob.java:62) at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:206) ... 10 more 22-10-2020 15:51:17 CST ray_spark_test ERROR - Job run failed! java.lang.Exception: Job set up failed: execution jar is suppose to be in this folder, but the folder doesn't exist: /apps/azkaban/exec-server/bin/executions/10/hdfs:///spark-jar/test at azkaban.jobExecutor.ProcessJob.handleError(ProcessJob.java:434) at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:208) at azkaban.jobtype.AbstractHadoopJavaProcessJob.run(AbstractHadoopJavaProcessJob.java:50) at azkaban.jobtype.HadoopSparkJob.run(HadoopSparkJob.java:307) at azkaban.execapp.JobRunner.runJob(JobRunner.java:830) at azkaban.execapp.JobRunner.doRun(JobRunner.java:607) at azkaban.execapp.JobRunner.run(JobRunner.java:568) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: execution jar is suppose to be in this folder, but the folder doesn't exist: /apps/azkaban/exec-server/bin/executions/10/hdfs:///spark-jar/test at azkaban.jobtype.HadoopJobUtils.resolveExecutionJarName(HadoopJobUtils.java:319) at azkaban.jobtype.HadoopSparkJob.executionJarHelper(HadoopSparkJob.java:219) at azkaban.jobtype.HadoopSparkJob.testableGetMainArguments(HadoopSparkJob.java:193) at azkaban.jobtype.HadoopSparkJob.getMainArguments(HadoopSparkJob.java:406) at azkaban.jobExecutor.JavaProcessJob.createCommandLine(JavaProcessJob.java:73) at azkaban.jobExecutor.JavaProcessJob.getCommandList(JavaProcessJob.java:62) at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:206) ... 10 more

datagic commented 3 years ago

https://github.com/azkaban/azkaban-plugins/issues/267#issuecomment-374497146 This has a problem similar to mine, but I have also configured this parameter, but it has no effect

datagic commented 3 years ago

My Job

conf.spark.yarn.archive=hdfs:///hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz dependencies= driver-library-path=/usr/local/lib num-executors=20 conf.spark.history.fs.logDirectory=hdfs:///spark2-history/ conf.spark.driver.memory=4g execution-jar=hdfs:///spark-jar/test/layer_spark_20201019105827.jar conf.spark.executor.cores=4 conf.spark.yarn.jars=/usr/hdp/current/spark2-client/jars/*.jar conf.spark.eventLog.enabled=false type=spark conf.spark.files=file:///usr/hdp/current/spark2-client/conf/hive-site.xml conf.spark.dynamicAllocation.executorIdleTimeout=60 conf.spark.eventLog.dir=hdfs:///logs/spark-event-logs class=com.app.datacenter.main.MainSparkTest executor-memory=2G conf.spark.executor.memory=12G conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 conf.spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 params=$date() master=yarn verbose=false deploy-mode=cluster retries=0 conf.spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5 name=yarn_test conf.spark.dynamicAllocation.schedulerBacklogTimeout=1 queue=product