cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(LDA)Example/LDADriver/ Job aborted due to stage failure: java.lang.ArrayIndexOutOfBoundsException: -6 #50

Open ylqfp opened 8 years ago

ylqfp commented 8 years ago

Example/LDADriver

Job aborted due to stage failure: Task 9 in stage 28.1 failed 4 times, most recent failure: Lost task 9.3 in stage 28.1 (TID 355, cloud1014121118.wd.nm.ss.nop.ted): java.lang.ArrayIndexOutOfBoundsException: -6 at org.apache.spark.graphx2.impl.EdgePartition.dstIds(EdgePartition.scala:114) at org.apache.spark.graphx2.impl.EdgePartition$$anon$1.next(EdgePartition.scala:341) at org.apache.spark.graphx2.impl.EdgePartition$$anon$1.next(EdgePartition.scala:333) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.graphx2.impl.EdgePartition$$anon$1.foreach(EdgePartition.scala:333) at org.apache.spark.graphx2.impl.RoutingTablePartition$.edgePartitionToMsgs(RoutingTablePartition.scala:58) at org.apache.spark.graphx2.VertexRDD$$anonfun$4$$anonfun$apply$2.apply(VertexRDD.scala:359) at org.apache.spark.graphx2.VertexRDD$$anonfun$4$$anonfun$apply$2.apply(VertexRDD.scala:359) at scala.Function$$anonfun$tupled$1.apply(Function.scala:77) at scala.Function$$anonfun$tupled$1.apply(Function.scala:76) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

ylqfp commented 8 years ago

My dataset is libsvm style: 0 0:3 1:2 2:10 ... 1 0:1 1:1 2:3 ... .....

Is it a bug or something wrong with my dataset?

Thanks!

ylqfp commented 8 years ago

Following the instructions from google, I enlarge the memory, however still failed. @witgo

witgo commented 8 years ago

ping @bhoppi

ylqfp commented 8 years ago

@bhoppi @hucheng Help!

bhoppi commented 8 years ago

I can't get useful info from your log. Can you dig your spark log for more detail? And please share your cmd parameters.

ylqfp commented 8 years ago

spark-submit --master yarn-client --class com.github.cloudml.zen.examples.ml.LDADriver \ --jars ml/target/zen-ml_2.10-0.3-SNAPSHOT.jar \ --executor-memory 6G --driver-memory 6G --num-executors 200 --executor-cores 1 \ --conf spark.driver.maxResultSize=6G \ --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError" \ --conf spark.yarn.am.memory=2g \ --conf spark.yarn.am.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled" \ --conf spark.storage.memoryFraction=0.1 \ --conf spark.yarn.executor.memoryOverhead=6666 \ --conf spark.sql.shuffle.partitions=2000 \ --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -XX:MaxDirectMemorySize=2048m -Xmn100m -XX:MaxTenuringThreshold=1 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10 -XX:+UseCompressedOops -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log" \ examples/target/zen-examples_2.10-0.3-SNAPSHOT.jar \ -numTopics=1000 \ -alpha=0.1 \ -beta=0.01 \ -alphaAS=0.01 \ -totalIter=50 \ -numPartitions=20 \ -useKryo=true \ -ignoredocid=true \ /user/distml/ldatest/input4 \ /user/distml/ldatest/output2

ylqfp commented 8 years ago

[Uploading log.txt…]()

ylqfp commented 8 years ago

The log.txt is a little big, so a attched it in previous post. Tell me if you cannot see the file.

bhoppi commented 8 years ago

Sorry I can't read the log file.

ylqfp commented 8 years ago

gclog.txt log.txt

ylqfp commented 8 years ago

Upload done... @bhoppi

bhoppi commented 8 years ago

@ylqfp Can you upload the container log? I can't still get the point from the master log.

ylqfp commented 8 years ago

Dear Bhoppi, Sorry for the late response! I use yarn logs -applicationID to get container log, however got nothing. Could you please tell me where to find the container log? Thanks!