HPCE / hpce-2017-cw5

1 stars 6 forks source link

GPU vs CPU double-precision in gaussian_blur #46

Closed malharjajoo closed 6 years ago

malharjajoo commented 6 years ago

Hi,

I am aware that there have been a few issues on this topic previously, but they don't seem to provide a conclusion. I have tried using the openCL compilation flags mentioned in ( #35 ) but they don't seem to improve the results.

I have a sequential implementation ( that works perfectly ) and thought a similar approach might work on GPU but the reference/CPU and GPU results differ by a lot ( >> 2).

Am, I missing something really obvious ?

guigzzz commented 6 years ago

We were having issues with what we thought had to do with precision as the output was generally right. Turns out I just had a very cheeky bug in my kernel, the consequence of this being that some pixels in the output weren't being computed correctly. Now that that bug is fixed, everything works fine. I saw that you posted on the other issue. The reason why -cl-fp32-correctly-rounded-divide-sqrt seemed to fix our problems was that the bug I've mentioned had to do with a float division. Improving the accuracy of that division made the bug less likely to happen. All of the operations in Gaussian Blur are double precision ops, so the above flag shouldn't change anything (Note fp32 vs double which is fp64). You most probably have a bug in your code, as did I :)

m8pple commented 6 years ago

Echoing what @guigzzz suggests, I would look deeply at the values that each implementation produces. A suggestion is to dump the pixels before rounding from both the GPU and the CPU, along with the sum of the coefficients. You can do this with printf from inside the kernel. If you then do the same from the CPU, you can then compare them directly (e.g. visually).

For example, if you printed out:

SOMETHINGUNIQUE, x, y, pixel, coeffSum

for every pixel, you could then grep out the lines you're interested in by searching for SOMETHINGUNIQUE, and convert to csv. You can then import to a pivot table or matlab, and do a direct comparison at each point. Looking at the software coeffs subtracted from the hardware coeffs might well tell you something.

If you need to go further, you could dump:

x, y, dx, dy, coeff

though you'll want to use a relatively small image.

malharjajoo commented 6 years ago

HI,

Thank you everyone. I managed to fix it for now.