Open zsszatmari opened 9 years ago
Hi @treasurebox , CPU devices have not been a target for either testing or performance work, so we can make no claims that it will work or that it will perform well. It should work (as theory), as opencl abstracts the device implementation assuming there are no problems in the runtime.
-1015 is the code for clsparseInvalidKernelExecution. Its returned from many places in the code. Can you step through and see where this return code is returned from? Are you building debug versions of the library?
For what it's worth, I tried to test this by changing line 126 of sample-spmv.cpp from cl_status = platform.getDevices(CL_DEVICE_TYPE_GPU, &devices);
to cl_status = platform.getDevices(CL_DEVICE_TYPE_CPU, &devices);
.
When running this on an AMD A10-7850K CPU using the AMD APP SDK on Linux, the program completed successfully (without Error -1015). As such, we will likely need your help in debugging this. Thank you for offering -- your help with the previous double precision issue is greatly appreciated.
As for the performance of the algorithm on a CPU, as kknox said, we have not yet done any performance analysis or optimizations for CPUs. The SpMV algorithms we have currently implemented are focused on optimizing GPU performance. (For example, the csrmv_adaptive algorithm is described in the paper "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format" from SC14 and the upcoming paper "Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices" at the upcoming HiPC 2015.)
If you get a chance to test the performance on a CPU, I would be interested to hear the results.
-1015 is the code for clsparseInvalidKernelExecution. Its returned from many places in the code. Can you step through and see where this return code is returned from?
Hi! Sorry for disappearing, I'd love to help debugging this, but currently I am a bit swamped under my other duties.
Hi!
I've modified the sample to use CL_DEVICE_TYPE_CPU, for comparison/benchmarking purposes. It didn't work:
Let me know how I can help diagnosing this! As a secondary question, can I expect the algorithm to work with reasonable performance on a CPU, or it is only good for an actual GPU ?
(I am on jlgreathouse's repo develop branch currently)