IBMSparkGPU / GPUEnabler

Provides GPU awareness to Spark, Contact: @kmadhugit and @kiszk
Apache License 2.0
172 stars 59 forks source link

Some tests fail #28

Open TPolzer opened 8 years ago

TPolzer commented 8 years ago

I have run the testsuite multiple times on a Tesla K20m GPU with Cuda/7.5 and the results look non-deterministic.

The number of failures was always between 3 and 9 (except two runs where com.ibm.gpuenabler.TestJavaCUDASuite failed and the rest was skipped). Here is a particularly bad run, with 9 failed tests:

$ mvn package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] gpu-enabler-parent
[INFO] mavenized-jcuda
[INFO] gpu-enabler_2.10
[INFO] GPU Enabler Examples
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building gpu-enabler-parent 1.0.0
[INFO] ------------------------------------------------------------------------
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building mavenized-jcuda 0.1.2
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ mavenized-jcuda ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/hpc/iwi2/iwi2001h/GPUEnabler/mavenized-jcuda/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ mavenized-jcuda ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ mavenized-jcuda ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/hpc/iwi2/iwi2001h/GPUEnabler/mavenized-jcuda/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:testCompile (default-testCompile) @ mavenized-jcuda ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ mavenized-jcuda ---
[INFO] No tests to run.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ mavenized-jcuda ---
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building gpu-enabler_2.10 1.0.0
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:add-source (eclipse-add-source) @ gpu-enabler_2.10 ---
[INFO] Add Test Source directory: /home/hpc/iwi2/iwi2001h/GPUEnabler/gpu-enabler/src/scala
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ gpu-enabler_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/hpc/iwi2/iwi2001h/GPUEnabler/gpu-enabler/src/main/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ gpu-enabler_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal incremental compile
[INFO] Using incremental compilation
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ gpu-enabler_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] >>> scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) > generate-sources @ gpu-enabler_2.10 >>>
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:add-source (eclipse-add-source) @ gpu-enabler_2.10 ---
[INFO] 
[INFO] <<< scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) < generate-sources @ gpu-enabler_2.10 <<<
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) @ gpu-enabler_2.10 ---
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=64m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; support was removed in 8.0
model contains 16 documentable templates
[INFO] Building jar: /home/hpc/iwi2/iwi2001h/GPUEnabler/gpu-enabler/target/gpu-enabler_2.10-1.0.0-javadoc.jar
[INFO] 
[INFO] --- maven-scala-plugin:2.15.2:compile (default) @ gpu-enabler_2.10 ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.java,**/*.scala,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ gpu-enabler_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ gpu-enabler_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal incremental compile
[INFO] Using incremental compilation
[INFO] Compiling 1 Java source to /home/hpc/iwi2/iwi2001h/GPUEnabler/gpu-enabler/target/scala-2.10/test-classes...
[WARNING] warning: [options] bootstrap class path not set in conjunction with -source 1.6
[WARNING] 1 warning
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ gpu-enabler_2.10 ---
[INFO] Compiling 1 source file to /home/hpc/iwi2/iwi2001h/GPUEnabler/gpu-enabler/target/scala-2.10/test-classes
[INFO] 
[INFO] --- maven-dependency-plugin:2.10:build-classpath (default) @ gpu-enabler_2.10 ---
[INFO] 
[INFO] --- maven-scala-plugin:2.15.2:testCompile (default) @ gpu-enabler_2.10 ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.java,**/*.scala,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ gpu-enabler_2.10 ---

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.ibm.gpuenabler.TestJavaCUDASuite
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.692 sec - in com.ibm.gpuenabler.TestJavaCUDASuite

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-dependency-plugin:2.10:copy-dependencies (strip-native-lib-version-exec-test) @ gpu-enabler_2.10 ---
[INFO] jcuda:libJCudaDriver:so:linux-x86_64:0.7.0a already exists in destination.
[INFO] jcuda:libJCudaRuntime:so:linux-x86_64:0.7.0a already exists in destination.
[INFO] 
[INFO] --- scalatest-maven-plugin:1.0:test (test) @ gpu-enabler_2.10 ---
Discovery starting.
Discovery completed in 377 milliseconds.
Run starting. Expected test count is: 26
CUDAFunctionSuite:
- Ensure CUDA kernel is serializable
- Run count()
- Run identity CUDA kernel on a single primitive column
- Run identity CUDA kernel on a single primitive array column
- Run identity CUDA kernel on a single primitive array in a structure
- Run add CUDA kernel with free variables on a single primitive array column
- Run vectorLength CUDA kernel on 2 col -> 1 col
- Run plusMinus CUDA kernel on 2 col -> 2 col *** FAILED ***
  0.10999999977648259 was not less than 1.0E-5 (CUDAFunctionSuite.scala:314)
- Run applyLinearFunction CUDA kernel on 1 col + 2 const arg -> 1 col
- Run blockXOR CUDA kernel on 1 col + 1 const arg -> 1 col on custom dimensions
- Run sum CUDA kernel on 1 col -> 1 col in 2 stages
- Run map on rdds - single partition *** FAILED ***
  scala.this.Predef.intArrayOps(output).sameElements[Int](scala.this.Predef.intWrapper(1).to(n).map[Int, scala.collection.immutable.IndexedSeq[Int]](((x$4: Int) => x$4.*(2)))(immutable.this.IndexedSeq.canBuildFrom[Int])) was false (CUDAFunctionSuite.scala:471)
- Run reduce on rdds - single partition
- Run map + reduce on rdds - single partition *** FAILED ***
  0 did not equal 110 (CUDAFunctionSuite.scala:543)
- Run map on rdds with 100,000 elements - multiple partition *** FAILED ***
  scala.this.Predef.intArrayOps(output).sameElements[Int](scala.this.Predef.intWrapper(1).to(n).map[Int, scala.collection.immutable.IndexedSeq[Int]](((x$5: Int) => x$5.*(2)))(immutable.this.IndexedSeq.canBuildFrom[Int])) was false (CUDAFunctionSuite.scala:569)
- Run map + reduce on rdds - multiple partitions

[Stage 0:>                                                        (0 + 40) / 64]
[Stage 0:>                                                        (1 + 40) / 64]
[Stage 0:=================>                                      (20 + 40) / 64]
[Stage 0:=====================>                                  (25 + 39) / 64]
[Stage 0:=========================>                              (29 + 35) / 64]
[Stage 0:===================================>                    (40 + 24) / 64]
[Stage 0:===================================>                    (41 + 23) / 64]
[Stage 0:======================================>                 (44 + 20) / 64]
[Stage 0:==========================================>             (48 + 16) / 64]
[Stage 0:===============================================>        (54 + 10) / 64]
[Stage 0:===================================================>     (58 + 6) / 64]

- Run map + reduce on rdds with 100,000,000 elements - multiple partitions *** FAILED ***
  -1615290332 did not equal 1974919424 (CUDAFunctionSuite.scala:647)
- Run map + map + reduce on rdds - multiple partitions *** FAILED ***
  18916 did not equal 20200 (CUDAFunctionSuite.scala:687)
- Run map + map + map + collect on rdds *** FAILED ***
  scala.this.Predef.intArrayOps(output).sameElements[Int](scala.this.Predef.intWrapper(1).to(n).map[Int, scala.collection.immutable.IndexedSeq[Int]](((x$6: Int) => x$6.*(8)))(immutable.this.IndexedSeq.canBuildFrom[Int])) was false (CUDAFunctionSuite.scala:715)
- Run map + map + map + reduce on rdds - multiple partitions *** FAILED ***
  36056 did not equal 40400 (CUDAFunctionSuite.scala:757)
- Run map on rdd with a single primitive array column
- Run map with free variables on rdd with a single primitive array column
- Run reduce on rdd with a single primitive array column
- Run map & reduce on a single primitive array in a structure
- Run logistic regression
- CUDA GPU Cache Testcase *** FAILED ***
  scala.this.Predef.intArrayOps(r2).sameElements[Int](scala.this.Predef.wrapIntArray(scala.this.Predef.intArrayOps(r1).map[Int, Array[Int]]({
    ((x: Int) => mulby2(x))
  })(scala.this.Array.canBuildFrom[Int](ClassTag.Int)))) was false (CUDAFunctionSuite.scala:1039)
Run completed in 35 seconds, 686 milliseconds.
Total number of tests run: 26
Suites: completed 2, aborted 0
Tests: succeeded 17, failed 9, canceled 0, ignored 0, pending 0
*** 9 TESTS FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] gpu-enabler-parent ................................. SUCCESS [  0.002 s]
[INFO] mavenized-jcuda .................................... SUCCESS [  4.450 s]
[INFO] gpu-enabler_2.10 ................................... FAILURE [ 59.976 s]
[INFO] GPU Enabler Examples ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:04 min
[INFO] Finished at: 2016-08-30T10:33:10+02:00
[INFO] Final Memory: 47M/258M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project gpu-enabler_2.10: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :gpu-enabler_2.10

One failure even produced a stacktrace (which might be helpful):

- Run map + map + reduce on rdds - multiple partitions *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost): jcuda.CudaException: CUDA_ERROR_INVALID_VALUE
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuMemcpyDtoHAsync(JCudaDriver.java:5732)
    at com.ibm.gpuenabler.HybridIterator$$anonfun$copyGpuToCpu$1.apply(HybridIterator.scala:198)
    at com.ibm.gpuenabler.HybridIterator$$anonfun$copyGpuToCpu$1.apply(HybridIterator.scala:162)
    at scala.runtime.Tuple2Zipped$$anonfun$map$extension$1.apply(Tuple2Zipped.scala:40)
    at scala.runtime.Tuple2Zipped$$anonfun$map$extension$1.apply(Tuple2Zipped.scala:38)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.runtime.Tuple2Zipped$.map$extension(Tuple2Zipped.scala:38)
    at com.ibm.gpuenabler.HybridIterator.copyGpuToCpu(HybridIterator.scala:162)
    at com.ibm.gpuenabler.HybridIterator.freeGPUMemory(HybridIterator.scala:121)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:421)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
  at scala.Option.foreach(Option.scala:236)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
  ...
  Cause: jcuda.CudaException: CUDA_ERROR_INVALID_VALUE
  at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
  at jcuda.driver.JCudaDriver.cuMemcpyDtoHAsync(JCudaDriver.java:5732)
  at com.ibm.gpuenabler.HybridIterator$$anonfun$copyGpuToCpu$1.apply(HybridIterator.scala:198)
  at com.ibm.gpuenabler.HybridIterator$$anonfun$copyGpuToCpu$1.apply(HybridIterator.scala:162)
  at scala.runtime.Tuple2Zipped$$anonfun$map$extension$1.apply(Tuple2Zipped.scala:40)
  at scala.runtime.Tuple2Zipped$$anonfun$map$extension$1.apply(Tuple2Zipped.scala:38)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at scala.runtime.Tuple2Zipped$.map$extension(Tuple2Zipped.scala:38)
skonto commented 8 years ago

I see tests failing randomly with a Quadro M1000M. I repeated mvn test multiple times. Any ideas?

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.ibm.gpuenabler.TestJavaCUDASuite
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.578 sec - in com.ibm.gpuenabler.TestJavaCUDASuite

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-dependency-plugin:2.10:copy-dependencies (strip-native-lib-version-exec-test) @ gpu-enabler_2.10 ---
[INFO] jcuda:libJCudaDriver:so:linux-x86_64:0.7.0a already exists in destination.
[INFO] jcuda:libJCudaRuntime:so:linux-x86_64:0.7.0a already exists in destination.
[INFO] 
[INFO] --- scalatest-maven-plugin:1.0:test (test) @ gpu-enabler_2.10 ---
Discovery starting.
Discovery completed in 194 milliseconds.
Run starting. Expected test count is: 27
CUDAFunctionSuite:
- Ensure CUDA kernel is serializable

[Stage 0:>                                                          (0 + 0) / 4]

- Run count()
- Run identity CUDA kernel on a single primitive column
- Run identity CUDA kernel on a single primitive array column
- Run identity CUDA kernel on a single primitive array in a structure
- Run add CUDA kernel with free variables on a single primitive array column
- Run vectorLength CUDA kernel on 2 col -> 1 col
- Run plusMinus CUDA kernel on 2 col -> 2 col
- Run applyLinearFunction CUDA kernel on 1 col + 2 const arg -> 1 col
- Run blockXOR CUDA kernel on 1 col + 1 const arg -> 1 col on custom dimensions
- Run sum CUDA kernel on 1 col -> 1 col in 2 stages
- Run map on rdds - single partition
- Run map on rdds - multiple partition - test empty partition
- Run reduce on rdds - single partition
- Run map + reduce on rdds - single partition
- Run map on rdds with 100,000 elements - multiple partition
- Run map + reduce on rdds - multiple partitions

[Stage 0:========>                                                 (9 + 8) / 64]
[Stage 0:==============>                                          (16 + 8) / 64]
[Stage 0:=====================>                                   (24 + 8) / 64]
[Stage 0:============================>                            (32 + 8) / 64]
[Stage 0:=================================>                       (38 + 8) / 64]
[Stage 0:=====================================>                   (42 + 8) / 64]
[Stage 0:=========================================>               (47 + 8) / 64]
[Stage 0:==========================================>              (48 + 8) / 64]
[Stage 0:=================================================>       (56 + 8) / 64]
[Stage 0:=======================================================> (62 + 2) / 64]

- Run map + reduce on rdds with 100,000,000 elements - multiple partitions *** FAILED ***
  -296974186 did not equal 1974919424 (CUDAFunctionSuite.scala:675)
- Run map + map + reduce on rdds - multiple partitions
- Run map + map + map + collect on rdds
- Run map + map + map + reduce on rdds - multiple partitions
- Run map on rdd with a single primitive array column *** FAILED ***
  scala.this.Predef.intArrayOps(outputItr.next()).toIndexedSeq.sameElements[Int](scala.this.Predef.intWrapper(0).to(n.-(1))) was false (CUDAFunctionSuite.scala:812)
- Run map with free variables on rdd with a single primitive array column
- Run reduce on rdd with a single primitive array column *** FAILED ***
  scala.this.Predef.intArrayOps(output).toIndexedSeq.sameElements[Int](scala.this.Predef.intWrapper(n).to(2.*(n).-(1)).map[Int, scala.collection.immutable.IndexedSeq[Int]](((x$8: Int) => x$8.*(2)))(immutable.this.IndexedSeq.canBuildFrom[Int])) was false (CUDAFunctionSuite.scala:885)
- Run map & reduce on a single primitive array in a structure
- Run logistic regression *** FAILED ***
  382.29565646287256 was not less than 1.0E-7 (CUDAFunctionSuite.scala:1027)
- CUDA GPU Cache Testcase
Run completed in 2 minutes, 48 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 23, failed 4, canceled 0, ignored 0, pending 0
*** 4 TESTS FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] gpu-enabler-parent ................................. SUCCESS [  0.002 s]
[INFO] mavenized-jcuda .................................... SUCCESS [  1.121 s]
[INFO] gpu-enabler_2.10 ................................... FAILURE [03:21 min]
[INFO] GPU Enabler Examples ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:22 min
[INFO] Finished at: 2016-09-15T13:07:34+03:00
[INFO] Final Memory: 60M/945M
[INFO] ------------------------------------------------------------------------
dschulz72 commented 8 years ago

I'm having the same issue. I've tried it on 4 separate computers. 3 have GTX 960s and one has a GTX 1060. I get a different number for the sum every time.

GPUEnablerOut.txt

josiahsams commented 8 years ago

Thanks for your interest in this project.

One issue is related to integer overflow and it can be handle quickly. For the other failures related to x86 GPUs, we are looking into this issue and will update this post with our findings soon.