HPCE / hpce-2017-cw5

1 stars 6 forks source link

Gaussian Blur: GPU precision problems #35

Closed TheoFranquet closed 6 years ago

TheoFranquet commented 6 years ago

An openCL kernel has been running on both the CPU and the GPU of the same machine. CPU output is consistently correct while GPU output mostly fails the correctness test.

This has led me to believe that there might be a precision issue when using a GPU as the output is only slightly wrong. The problem is that we already use double precision for everything.

Anyone have any ideas?

EDIT: It would seem that adding -cl-fp32-correctly-rounded-divide-sqrt flag to the kernel compiler fixes this problem. This seems a bit weird as the fp32 in the flag would seem to indicate that this only applies to normal floats... I found the above flag from here

m8pple commented 6 years ago

I'm quite impressed that approximate divisions have that much effect, as the comparison criterion is quite generous. However, it tells you something about what the expensive operations are - if they went so far as to make inaccurate divisions the default, even knowing it would break code, how much must the accurate divisions cost?

guigzzz commented 6 years ago

Interestingly, without that flag, we would sometimes have differences of 3 or even 4. This still doesn't make much sense to me as the only division that we have in the kernel is the division by 2 pi r and I can't believe that the division would be inaccacurate to the point of making the test fail...

I have also noticed that the compiler doesn't complain when #pragma OPENCL EXTENSION cl_khr_fp64 : enable isn't present at the beginning of the kernel code while double variables are used. Any idea why that might be happening? From what I understand, when using OpenCL 1.1 (I have a nvidia gpu so that's all i'm using), it is necessary to enable the fp64 extension in order to use doubles

m8pple commented 6 years ago

For the differences, it is probably interesting to printf the coeffs calculated on the GPU. The difference is more than I would expect too.

Regarding the pragma, I think this is because NVidia also supports OpenCL 1.2, which made double precision a sort of weird optional thing. It might not be there, but if it is there you don't need to enable the extension.

malharjajoo commented 6 years ago

@TheoFranquet - Are you sure that -cl-fp32-correctly-rounded-divide-sqrt is valid for openCL 1.2 ? I get an error like -

Error in processing command line: Don't understand command line argument "-cl-fp32-correctly-rounded-divide-sqrt"!