clMathLibraries / clSPARSE

a software library containing Sparse functions written in OpenCL
Apache License 2.0
173 stars 61 forks source link

Cannot run sample-spmv on CPU #149

Open zsszatmari opened 9 years ago

zsszatmari commented 9 years ago

Hi!

I've modified the sample to use CL_DEVICE_TYPE_CPU, for comparison/benchmarking purposes. It didn't work:

$ ./sample-spmv /Users/zsszatmari/projects/cltest/100000.mtx 
Executing sample clSPARSE SpMV (y = A*x) C++
Matrix will be read from: /Users/zsszatmari/projects/cltest/100000.mtx
Platform ID 0 : Apple

Getting devices from platform 0
Device ID 0 : Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz
Matrix: /Users/zsszatmari/projects/cltest/100000.mtx [nRow: 100000] [nCol: 100000] [nNZ: 199998]
Problem with execution SpMV algorithm. Error: -1015
Program completed successfully.

Let me know how I can help diagnosing this! As a secondary question, can I expect the algorithm to work with reasonable performance on a CPU, or it is only good for an actual GPU ?

(I am on jlgreathouse's repo develop branch currently)

kknox commented 9 years ago

Hi @treasurebox , CPU devices have not been a target for either testing or performance work, so we can make no claims that it will work or that it will perform well. It should work (as theory), as opencl abstracts the device implementation assuming there are no problems in the runtime.

-1015 is the code for clsparseInvalidKernelExecution. Its returned from many places in the code. Can you step through and see where this return code is returned from? Are you building debug versions of the library?

jlgreathouse commented 9 years ago

For what it's worth, I tried to test this by changing line 126 of sample-spmv.cpp from cl_status = platform.getDevices(CL_DEVICE_TYPE_GPU, &devices); to cl_status = platform.getDevices(CL_DEVICE_TYPE_CPU, &devices);.

When running this on an AMD A10-7850K CPU using the AMD APP SDK on Linux, the program completed successfully (without Error -1015). As such, we will likely need your help in debugging this. Thank you for offering -- your help with the previous double precision issue is greatly appreciated.

As for the performance of the algorithm on a CPU, as kknox said, we have not yet done any performance analysis or optimizations for CPUs. The SpMV algorithms we have currently implemented are focused on optimizing GPU performance. (For example, the csrmv_adaptive algorithm is described in the paper "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format" from SC14 and the upcoming paper "Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices" at the upcoming HiPC 2015.)

If you get a chance to test the performance on a CPU, I would be interested to hear the results.

zsszatmari commented 9 years ago

-1015 is the code for clsparseInvalidKernelExecution. Its returned from many places in the code. Can you step through and see where this return code is returned from?

Hi! Sorry for disappearing, I'd love to help debugging this, but currently I am a bit swamped under my other duties.