IBMSparkGPU / GPUEnabler

Provides GPU awareness to Spark, Contact: @kmadhugit and @kiszk
Apache License 2.0
172 stars 59 forks source link

Tests fail on GeForce GTX 680 with CUDA_ERROR_NO_BINARY_FOR_GPU #27

Open TPolzer opened 8 years ago

TPolzer commented 8 years ago

The environment is Ubuntu 16.04 with Cuda/7.5 and one GeForce GTX 680.

$ mvn test
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] gpu-enabler-parent
[INFO] mavenized-jcuda
[INFO] gpu-enabler_2.10
[INFO] GPU Enabler Examples
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building gpu-enabler-parent 1.0.0
[INFO] ------------------------------------------------------------------------
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building mavenized-jcuda 0.1.2
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ mavenized-jcuda ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home.stand/inf2/****/GPUEnabler/mavenized-jcuda/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ mavenized-jcuda ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ mavenized-jcuda ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home.stand/inf2/****/GPUEnabler/mavenized-jcuda/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:testCompile (default-testCompile) @ mavenized-jcuda ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ mavenized-jcuda ---
[INFO] No tests to run.
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building gpu-enabler_2.10 1.0.0
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:add-source (eclipse-add-source) @ gpu-enabler_2.10 ---
[INFO] Add Test Source directory: /home.stand/inf2/****/GPUEnabler/gpu-enabler/src/scala
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ gpu-enabler_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home.stand/inf2/****/GPUEnabler/gpu-enabler/src/main/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ gpu-enabler_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal incremental compile
[INFO] Using incremental compilation
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ gpu-enabler_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] >>> scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) > generate-sources @ gpu-enabler_2.10 >>>
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:add-source (eclipse-add-source) @ gpu-enabler_2.10 ---
[INFO] 
[INFO] <<< scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) < generate-sources @ gpu-enabler_2.10 <<<
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) @ gpu-enabler_2.10 ---
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=64m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; support was removed in 8.0
model contains 17 documentable templates
[INFO] Building jar: /home.stand/inf2/****/GPUEnabler/gpu-enabler/target/gpu-enabler_2.10-1.0.0-javadoc.jar
[INFO] 
[INFO] --- maven-scala-plugin:2.15.2:compile (default) @ gpu-enabler_2.10 ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.java,**/*.scala,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ gpu-enabler_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ gpu-enabler_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal incremental compile
[INFO] Using incremental compilation
[INFO] Compiling 1 Java source to /home.stand/inf2/****/GPUEnabler/gpu-enabler/target/scala-2.10/test-classes...
[WARNING] warning: [options] bootstrap class path not set in conjunction with -source 1.6
[WARNING] 1 warning
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ gpu-enabler_2.10 ---
[INFO] Compiling 1 source file to /home.stand/inf2/****/GPUEnabler/gpu-enabler/target/scala-2.10/test-classes
[INFO] 
[INFO] --- maven-dependency-plugin:2.10:build-classpath (default) @ gpu-enabler_2.10 ---
[INFO] 
[INFO] --- maven-scala-plugin:2.15.2:testCompile (default) @ gpu-enabler_2.10 ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.java,**/*.scala,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ gpu-enabler_2.10 ---

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.ibm.gpuenabler.TestJavaCUDASuite
16/08/30 08:41:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:96)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:80)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:80)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:334)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:92)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/08/30 08:41:58 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
16/08/30 08:41:59 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:96)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:80)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:80)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:334)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/08/30 08:41:59 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
16/08/30 08:41:59 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:96)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:80)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:80)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:334)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/08/30 08:41:59 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.276 sec <<< FAILURE! - in com.ibm.gpuenabler.TestJavaCUDASuite
MultiMultiSum(com.ibm.gpuenabler.TestJavaCUDASuite)  Time elapsed: 2.746 sec  <<< ERROR!
org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:96)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:80)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:80)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:334)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:92)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at com.ibm.gpuenabler.TestJavaCUDASuite.MultiMultiSum(TestJavaCUDASuite.java:209)
Caused by: jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU

MultiSum(com.ibm.gpuenabler.TestJavaCUDASuite)  Time elapsed: 0.209 sec  <<< ERROR!
org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:96)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:80)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:80)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:334)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at com.ibm.gpuenabler.TestJavaCUDASuite.MultiSum(TestJavaCUDASuite.java:141)
Caused by: jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU

MultiPart(com.ibm.gpuenabler.TestJavaCUDASuite)  Time elapsed: 0.307 sec  <<< ERROR!
org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:312)
    at jcuda.driver.JCudaDriver.cuModuleLoadData(JCudaDriver.java:2014)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:96)
    at com.ibm.gpuenabler.CUDAManager$$anonfun$cachedLoadModule$1.apply(CUDAManager.scala:80)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    at com.ibm.gpuenabler.CUDAManager.cachedLoadModule(CUDAManager.scala:80)
    at com.ibm.gpuenabler.CUDAFunction.compute(CUDAFunction.scala:334)
    at com.ibm.gpuenabler.MapGPUPartitionsRDD.compute(CUDARDDUtils.scala:108)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at com.ibm.gpuenabler.TestJavaCUDASuite.MultiPart(TestJavaCUDASuite.java:82)
Caused by: jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU

Results :

Tests in error: 
  TestJavaCUDASuite.MultiMultiSum:209 » Spark Job aborted due to stage failure: ...
  TestJavaCUDASuite.MultiPart:82 » Spark Job aborted due to stage failure: Task ...
  TestJavaCUDASuite.MultiSum:141 » Spark Job aborted due to stage failure: Task ...

Tests run: 3, Failures: 0, Errors: 3, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] gpu-enabler-parent ................................. SUCCESS [  0.002 s]
[INFO] mavenized-jcuda .................................... SUCCESS [  0.979 s]
[INFO] gpu-enabler_2.10 ................................... FAILURE [ 15.109 s]
[INFO] GPU Enabler Examples ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.180 s
[INFO] Finished at: 2016-08-30T08:41:59+00:00
[INFO] Final Memory: 62M/1169M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on project gpu-enabler_2.10: There are test failures.
[ERROR] 
[ERROR] Please refer to /home.stand/inf2/****/GPUEnabler/gpu-enabler/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :gpu-enabler_2.10
dieselnexr commented 8 years ago

hi @TPolzer i meet the problem same as you. below is my solution. follow this instructions.

  1. refer to this document and read 6.2.2.2 section, then you can execute deviceQuery command
  2. the command prints out 'CUDA Capability Major/Minor version number' value.
  3. mine was 2.1 and value in GPUEnabler\examples\src\main\resources\MakeFile is 3.5 by default. that value exists at line# 4 (COMPUTE_CAPABILITY ?= 35)
    • GTX 680 may be 30
  4. change that value as yours. i have changed this value to 21(but a value in .ptx is 20. not 21
  5. then make again(i met a error on make. so i made a work-around that I removed SparkExamples.cu to backup dir. function not found error had occured.)

I hope this can help you. :)

compatibility information is at https://en.wikipedia.org/wiki/CUDA#GPUs_supported.

xuxilei commented 7 years ago

to dieselnexr : GPUEnabler\examples\src\main\resources\MakeFile , I follow: [root@localhost resources]# ptxas GpuEnablerExamples.ptx

ptxas fatal : SM version specified by .target is higher than default SM version assumed

why?

kiszk commented 7 years ago

Could you please post the result of head -20 GpuEnablerExamples.ptx?

xuxilei commented 7 years ago

//

// Generated by NVIDIA NVVM Compiler

//

// Compiler Build ID: CL-19324607

// Cuda compilation tools, release 7.0, V7.0.27

// Based on LLVM 3.4svn

//

.version 4.2

.target sm_35

.address_size 64

     // .weak   cudaMalloc

.extern .func __assertfail

(

     .param .b64 __assertfail_param_0,

     .param .b64 __assertfail_param_1,

     .param .b32 __assertfail_param_2,

     .param .b64 __assertfail_param_3,

     .param .b64 __assertfail_param_4

)

;

xuxilei commented 7 years ago

but as I run: ./bin/run-example GpuEnablerExample ./bin/run-example SparkGPULR It is ok why?

kiszk commented 7 years ago

We expected .target sm_30 for GTX 680 instead of .target sm_35.

Did you rebuild after updating Makefile as this comment suggested ?

xuxilei commented 7 years ago

my gpu is: nvidia GeForce GT 720

YuxinxinChen commented 7 years ago

Did you solve the problem? I have the exactly same problem and still get stuck in this problem