Open lcx517 opened 4 years ago
Hi, I'm running SONA-example,and got FAILED with stdout log here. PLEASE HELP~~
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for TERM 2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for HUP 2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for INT 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls to: deepthought 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls to: deepthought 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls groups to: 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls groups to: 2019-12-26 14:09:19 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(deepthought); groups with view permissions: Set(); users with modify permissions: Set(deepthought); groups with modify permissions: Set() 2019-12-26 14:09:20 INFO UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5 2019-12-26 14:09:20 INFO ApplicationMaster:54 - Preparing Local resources 2019-12-26 14:09:22 WARN Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error 2019-12-26 14:09:28 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001 2019-12-26 14:09:28 INFO AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis. 2019-12-26 14:09:28 INFO ApplicationMaster:54 - Starting the user application in a separate Thread 2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 2019-12-26 14:09:28 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples) 2019-12-26 14:09:28 INFO ShutdownHookManager:54 - Shutdown hook called
my SONA-example script:
source ./spark-on-angel-env.sh export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop $SPARK_HOME/bin/spark-submit \ --master yarn-cluster \ --driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \ --keytab /home/deepthought/deepthought.keytab \ --principal deepthought \ --queue longyuan.p0 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.instances=10 \ --conf spark.ps.cores=2 \ --conf spark.ps.memory=6g \ --jars $SONA_SPARK_JARS\ --name "LR-spark-on-angel" \ --files /data/angel/sona-0.1.0-bin/jsons/logreg.json \ --driver-memory 10g \ --num-executors 10 \ --executor-cores 2 \ --executor-memory 4g \ --class org.apache.spark.angel.examples.JsonRunnerExamples \ ./../lib/angelml-${SONA_VERSION}.jar \ data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \ modelPath:viewfs://hadoop-bd/user/deepthought/test/output \ jsonFile:./lr.json \ lr:0.1
and my spark-on-angel-env.sh:
export JAVA_HOME=/usr export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6 export SONA_HOME=/data/angel/sona-0.1.0-bin export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin export SONA_VERSION=0.1.0 export ANGEL_VERSION=3.0.1 export ANGEL_UTILS_VERSION=0.1.1 export ANGEL_MLCORE_VERSION=0.1.2 ...<not changed default content below>...```
class changed aleady, while doc is outdated!
You need to change "org.apache.spark.angel.examples.JsonRunnerExamples" to "com.tencent.angel.sona.examples.JsonRunnerExamples".
luck~
Hi, I'm running SONA-example,and got FAILED with stdout log here. PLEASE HELP~~
my SONA-example script:
and my spark-on-angel-env.sh: