OpenCL kernel outputs zeros when using GPU but not CPU

ghr55 commented 6 years ago

For the Random Projection puzzle, I've used openCL to perform some of the puzzle and the code performs correctly when the CPU runs the code but always fails when used on the GPU (the output becomes zero for all values).

I believe I have found that the issue is that a non-zero parameter (of type double) which is being passed to the kernel always equals zero within the kernel function. This is the case even when the value being passed is explicitly set to a non-zero value. For example, when I test the program with double p=100.0; and I pass p to the kernel using kernel.setArg(0, p);, the value of p becomes 0 when using the GPU but remains 100 when using the CPU. Does anyone know why the GPU is setting the value of p to zero when using the GPU?

Again, just to be clear, the openCL code works correctly when the CPU device is selected but not when the GPU device is.

kgiray commented 6 years ago

Have you tried casting the double to float? Maybe that might help to diagnose the issue.

ghr55 commented 6 years ago

I have tried that and it didn't solve the problem unfortunately. I also tried casting the double to a ulong which did allow the value of p to pass correctly, meaning, the value was no longer incorrectly set to zero. However, the output of the puzzle when using ulong is only correct for small scale values. I think this has something to do with ulong being smaller than a double. So, overall, casting the double to other values didn't fix the issue but did help me diagnose that it is indeed that parameter that is causing the problem.

Also, all this work is being done on a Mac. I don't know if it is an OS specific problem but it could be.

m8pple commented 6 years ago

I don't recognise the specific problem.

I'm not sure what you mean by casting to a ulong, do you mean doing this:

kernel.setArg(0, (unsigned long long)p);

or this:

kernel.setArg(0, *(unsigned long long *)&p);

?

The first one will lose precision (and requires a different kernel), but the second one I would expect to work.

You could also create an 8-byte buffer which contains the value, then pass it over as a buffer. As long as you only read it once at the start of the kernel and have a loop in the kernel then the overhead should not be too high.

Also - you might want to question whether it has to be a double, as double precision and int <-> double are expensive. What do you know about rHi in this line.

ghr55 commented 6 years ago

Sorry I should've explained better. When I said casting to ulong I actually meant I was changing the kernel function definition from:

__kernel void kernel_x(double p, ...

__kernel void kernel_x(ulong p, ...

By doing the above, the puzzle worked for small scale values but not for large. I later also tried to cast it in the method you showed above (i.e. when setting the argument value) but that didn't work either. Since then, I've changed the code so I don't use a double to begin with (as you suggested) and that has now eliminated the problem.

I still don't understand the bug I had so if anyone thinks they know what the issue was please let me know for future reference.

m8pple commented 6 years ago

Possibly this was related to #51 ?

ghr55 commented 6 years ago

I don't believe it was because I already tried adding the extension #pragma OPENCL EXTENSION cl_khr_fp64 : enable and my code still didn't work.

HPCE / hpce-2017-cw5

OpenCL kernel outputs zeros when using GPU but not CPU #38