Closed TheoFranquet closed 6 years ago
I'm quite impressed that approximate divisions have that much effect, as the comparison criterion is quite generous. However, it tells you something about what the expensive operations are - if they went so far as to make inaccurate divisions the default, even knowing it would break code, how much must the accurate divisions cost?
Interestingly, without that flag, we would sometimes have differences of 3 or even 4. This still doesn't make much sense to me as the only division that we have in the kernel is the division by 2 pi r and I can't believe that the division would be inaccacurate to the point of making the test fail...
I have also noticed that the compiler doesn't complain when #pragma OPENCL EXTENSION cl_khr_fp64 : enable
isn't present at the beginning of the kernel code while double variables are used. Any idea why that might be happening? From what I understand, when using OpenCL 1.1 (I have a nvidia gpu so that's all i'm using), it is necessary to enable the fp64 extension in order to use doubles
For the differences, it is probably interesting to printf
the coeffs calculated on the GPU. The
difference is more than I would expect too.
Regarding the pragma, I think this is because NVidia also supports OpenCL 1.2, which made double precision a sort of weird optional thing. It might not be there, but if it is there you don't need to enable the extension.
@TheoFranquet - Are you sure that -cl-fp32-correctly-rounded-divide-sqrt
is valid for openCL 1.2 ?
I get an error like -
Error in processing command line: Don't understand command line argument "-cl-fp32-correctly-rounded-divide-sqrt"!
An openCL kernel has been running on both the CPU and the GPU of the same machine. CPU output is consistently correct while GPU output mostly fails the correctness test.
This has led me to believe that there might be a precision issue when using a GPU as the output is only slightly wrong. The problem is that we already use double precision for everything.
Anyone have any ideas?
EDIT: It would seem that adding
-cl-fp32-correctly-rounded-divide-sqrt
flag to the kernel compiler fixes this problem. This seems a bit weird as thefp32
in the flag would seem to indicate that this only applies to normal floats... I found the above flag from here