(LDA)Example/LDADriver/ Job aborted due to stage failure: java.lang.ArrayIndexOutOfBoundsException: -6

ylqfp commented 8 years ago

Example/LDADriver

Job aborted due to stage failure: Task 9 in stage 28.1 failed 4 times, most recent failure: Lost task 9.3 in stage 28.1 (TID 355, cloud1014121118.wd.nm.ss.nop.ted): java.lang.ArrayIndexOutOfBoundsException: -6 at org.apache.spark.graphx2.impl.EdgePartition.dstIds(EdgePartition.scala:114) at org.apache.spark.graphx2.impl.EdgePartition$$anon$1.next(EdgePartition.scala:341) at org.apache.spark.graphx2.impl.EdgePartition$$anon$1.next(EdgePartition.scala:333) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.graphx2.impl.EdgePartition$$anon$1.foreach(EdgePartition.scala:333) at org.apache.spark.graphx2.impl.RoutingTablePartition$.edgePartitionToMsgs(RoutingTablePartition.scala:58) at org.apache.spark.graphx2.VertexRDD$$anonfun$4$$anonfun$apply$2.apply(VertexRDD.scala:359) at org.apache.spark.graphx2.VertexRDD$$anonfun$4$$anonfun$apply$2.apply(VertexRDD.scala:359) at scala.Function$$anonfun$tupled$1.apply(Function.scala:77) at scala.Function$$anonfun$tupled$1.apply(Function.scala:76) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

ylqfp commented 8 years ago

My dataset is libsvm style: 0 0:3 1:2 2:10 ... 1 0:1 1:1 2:3 ... .....

Is it a bug or something wrong with my dataset?

Thanks!

ylqfp commented 8 years ago

Following the instructions from google, I enlarge the memory, however still failed. @witgo

witgo commented 8 years ago

ping @bhoppi

ylqfp commented 8 years ago

@bhoppi @hucheng Help!

bhoppi commented 8 years ago

I can't get useful info from your log. Can you dig your spark log for more detail? And please share your cmd parameters.

ylqfp commented 8 years ago

spark-submit --master yarn-client --class com.github.cloudml.zen.examples.ml.LDADriver \ --jars ml/target/zen-ml_2.10-0.3-SNAPSHOT.jar \ --executor-memory 6G --driver-memory 6G --num-executors 200 --executor-cores 1 \ --conf spark.driver.maxResultSize=6G \ --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError" \ --conf spark.yarn.am.memory=2g \ --conf spark.yarn.am.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled" \ --conf spark.storage.memoryFraction=0.1 \ --conf spark.yarn.executor.memoryOverhead=6666 \ --conf spark.sql.shuffle.partitions=2000 \ --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -XX:MaxDirectMemorySize=2048m -Xmn100m -XX:MaxTenuringThreshold=1 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10 -XX:+UseCompressedOops -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log" \ examples/target/zen-examples_2.10-0.3-SNAPSHOT.jar \ -numTopics=1000 \ -alpha=0.1 \ -beta=0.01 \ -alphaAS=0.01 \ -totalIter=50 \ -numPartitions=20 \ -useKryo=true \ -ignoredocid=true \ /user/distml/ldatest/input4 \ /user/distml/ldatest/output2

ylqfp commented 8 years ago

[Uploading log.txt…]()

ylqfp commented 8 years ago

The log.txt is a little big, so a attched it in previous post. Tell me if you cannot see the file.

bhoppi commented 8 years ago

Sorry I can't read the log file.

ylqfp commented 8 years ago

gclog.txt log.txt

ylqfp commented 8 years ago

Upload done... @bhoppi

bhoppi commented 8 years ago

@ylqfp Can you upload the container log? I can't still get the point from the master log.

ylqfp commented 8 years ago

Dear Bhoppi, Sorry for the late response! I use yarn logs -applicationID to get container log, however got nothing. Could you please tell me where to find the container log? Thanks!

cloudml / zen

(LDA)Example/LDADriver/ Job aborted due to stage failure: java.lang.ArrayIndexOutOfBoundsException: -6 #50