Closed Balavarun5 closed 3 years ago
I have only tested the kernel with K40c, K80, P100 and V100. For other GPUs, you may want to tune the parameters according to the documentation here. You may also want to debug to see what went wrong by printing out some initial data.
Works on a workstation GPU, closing the issue.
When trying to run the code using GPU on local machine, the code executes successfully without any errors but the kernel-out.txt is coming out to be all zeros and does not match the output from the CPU method. Are there any parameters that need to be changed to get the code working as expected? The GPU on my machine is GeForce GTX 950M(compute capability 5.0) and CUDA version is 10.1