eBay / griffin

Model driven data quality service
https://ebay.github.io/griffin/
Other
240 stars 165 forks source link

jobs fail when use yarn-cluster #51

Open WangYongNingDA opened 6 years ago

WangYongNingDA commented 6 years ago

When I use yarn-cluster to submit spark jobs,the error log is as belows:

Application application_1512628890181_92576 failed 2 times due to AM Container for appattempt_1512628890181_92576_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://xy180-wecloud-198:8088/proxy/application_1512628890181_92576/Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://wecloud-cluster/user/pgxl/.sparkStaging/application_1512628890181_92576/com.databricks_spark-avro_2.10-2.0.1.jar
java.io.FileNotFoundException: File does not exist: hdfs://wecloud-cluster/user/pgxl/.sparkStaging/application_1512628890181_92576/com.databricks_spark-avro_2.10-2.0.1.jar

I don't kown why the parm in the config file doesn't work.I change to yarn-client ,the error log is as belows:

Warning: Skip remote jar hdfs://wecloud-cluster/project/pgxl/griffin/griffin-measure.jar.
Warning: Skip remote jar hdfs://wecloud-cluster/project/pgxl/griffin/datanucleus-api-jdo-3.2.6.jar.
Warning: Skip remote jar hdfs://wecloud-cluster/project/pgxl/griffin/datanucleus-core-3.2.10.jar.
Warning: Skip remote jar hdfs://wecloud-cluster/project/pgxl/griffin/datanucleus-rdbms-3.2.9.jar.
java.lang.ClassNotFoundException: org.apache.griffin.measure.Application
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

it seems the parm sparkJob.spark.jars.packages works,but using yarn-client mode i can't use jars on hdfs.I can't find source codes about how to process the config about sparkjars, can you give me some suggestion?thank you very much

bhlx3lyx7 commented 6 years ago

For the first question, I've replied you here, it's a issue about the config file. For the second question, you can submit spark job using local griffin-measure.jar in yarn-client mode first, and in yarn-client mode, you don't need to set datanucleus jars manually, they already exist in your local spark lib directory.