Angel-ML / sona

Spark On Angel, arming Spark with a powerful Parameter Server, which enable Spark to train very big models
Apache License 2.0
82 stars 50 forks source link

deepfm fail #55

Open XingweiChen opened 4 years ago

XingweiChen commented 4 years ago
屏幕快照 2019-10-18 11 26 53 屏幕快照 2019-10-18 11 27 27

spark-submit --master yarn-cluster --conf spark.ps.jars=hdfs:///user/brook/sona-0.1.0-bin/lib/fastutil-7.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/htrace-core-2.05.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/sizeof-0.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/kryo-shaded-4.0.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/minlog-1.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/memory-0.8.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/commons-pool-1.6.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netty-all-4.1.17.Final.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/hll-1.6.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jniloader-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/native_system-java-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/arpack_combined_all-0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-1.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jettison-1.4.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/json4s-native_2.11-3.2.11.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-format-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-mlcore-0.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-core-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-mllib-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-psf-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-math-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-graph-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-0.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angelml-0.1.0.jar,hdfs:///user/brook/angel-2.1.0-bin/lib/scala-library-2.11.8.jar \ --conf spark.ps.instances=2 --conf spark.ps.cores=3 --conf spark.ps.memory=5g \ --jars hdfs:///user/brook/sona-0.1.0-bin/lib/fastutil-7.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/htrace-core-2.05.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/sizeof-0.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/kryo-shaded-4.0.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/minlog-1.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/memory-0.8.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/commons-pool-1.6.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netty-all-4.1.17.Final.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/hll-1.6.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jniloader-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/native_system-java-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/arpack_combined_all-0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-1.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jettison-1.4.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/json4s-native_2.11-3.2.11.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-format-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-mlcore-0.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-core-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-mllib-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-psf-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-math-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-graph-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-0.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angelml-0.1.0.jar,hdfs:///user/brook/angel-2.1.0-bin/lib/scala-library-2.11.8.jar \ --files ./deepfm.json --driver-memory 2g --num-executors 2 --executor-cores 3 --executor-memory 5g \ --class com.tencent.angel.sona.examples.JsonRunnerExamples \ ../lib/angelml-0.1.0.jar \ jsonFile:./deepfm.json \ dataFormat:libsvm \ data:a9a_123d_train.libsvm \ modelPath:model_dfm \ predictPath:pred_dfm \ actionType:train \ numBatch:500 \ maxIter:2 \ lr:4.0 \ numField:39

This is my submit command. Both wide&deep and deepFm give this error. Looking forward to your help?

zero222 commented 4 years ago

I also have this problem

krisjin commented 4 years ago

send out the log