cdarnab / aparapi

Automatically exported from code.google.com/p/aparapi
0 stars 0 forks source link

Doubles generate compiler errors cl_amd_fp64 should be cl_khr_fp64, NaNs etc. #27

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Open BlackScholes example code
2. Change all floats to double
3. Compile and run (nvidia quadro 600)

What is the expected output? What do you see instead?

Q600 has native fp64 support, but the compiler generates some errors:

clBuildProgram failed
************************************************
:1:26: warning: unknown '#pragma OPENCL EXTENSION' - ignored
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
                         ^
:4:13: error: must specify '#pragma OPENCL EXTENSION cl_khr_fp64: enable' 
before using 'double'
   __global double *randArray;
            ^
:14:16: error: use of undeclared identifier 'NaN'
   double c2 = NaN;
               ^
:17:16: error: use of undeclared identifier 'NaN'
   double c5 = NaN;
               ^
:25:103: error: use of undeclared identifier 'NaN'
   double y = 1.0 - (((0.3989422917366028 * exp(((-X * X) / 2.0))) * t) * (0.3193815350532532 + (t * (NaN + (t * (1.781477928161621 + (t * (-1.8212559223175049 + (t * NaN)))))))));
                                                                                                      ^
:48:53: error: use of undeclared identifier 'NaN'
      double R = (0.009999999776482582 * inRand) + (NaN * (1.0 - inRand));
                                                    ^
:49:60: error: use of undeclared identifier 'NaN'
      double sigmaVal = (0.009999999776482582 * inRand) + (NaN * (1.0 - inRand));

What version of the product are you using? On what operating system?
aparapi-2011-10-13, windows XP

Please provide any additional information below.
FWIW floats work fine.

Original issue reported on code.google.com by stevecoo...@gmail.com on 6 Dec 2011 at 3:48

GoogleCodeExporter commented 8 years ago
cl_amd_fp64 is easy enough to fix:
KernelWriter.java
-
         writePragma("cl_amd_fp64", true);
+
         writePragma("cl_khr_fp64", true);

The NaNs are a bit more tricky though. Seems something strange going on with 
the ByteBuffer to Double conversion (ByteBuffer.d8()), since some floating 
point numbers convert OK, but others result in NaN.

e.g. 
"double thing = -0.356563782;"
byte buffer = [-65, -42, -47, -16, -27, -88, 50, 91] 
u8() returns -441961893
d8() returns NaN

but
"double another = 0.356;"
byte buffer = [6, 63, -42, -56, -76, 57, 88, 16]
u8() returns 4600084745787281506;
d8() returns 0.356

Original comment by stevecoo...@gmail.com on 6 Dec 2011 at 8:34

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Clearly from the #pragma name/value our AMD roots are showing. 

Hopefully we can replace the amd specific pragma with a more neutral one. This 
might just be just a string change in the pragma mapping. 

Not sure about the NaN. Possibly when the pragma is correct the OpenCL compiler 
will actually take a NaN (possibly but unlikely).  

I don't think this will be hard, but it will require an NVidia card to 
debug/test.  Unfortunately that is not me at present.

Original comment by frost.g...@gmail.com on 6 Dec 2011 at 1:22

GoogleCodeExporter commented 8 years ago
Not sure if you saw my later comment but even with pragma fixed there are some 
nans. This doesn't appear to be driver related since its in the java parsing 
code. Could you try to reproduce with an amd card ?

Original comment by stevecoo...@gmail.com on 6 Dec 2011 at 2:34

GoogleCodeExporter commented 8 years ago
Looks like we should be using 
http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#doubleToRawLongBi
ts%28double%29
Instead of 
http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#doubleToLongBits%
28double%29

To preserve NaN

Original comment by frost.g...@gmail.com on 6 Dec 2011 at 10:46

GoogleCodeExporter commented 8 years ago
Actually the code in question is converting bits to double (not the other way 
round). Not sure what the issue is with your ByteBuffer code, but I would 
suggest you just use the nio libraries, e.g. the following change fixed this 
particular issue:

double d8(int _offset) {
  return (java.nio.ByteBuffer.wrap(bytes, _offset, 8).getDouble());
}

You could probably replace your whole ByteBuffer code with the nio one.

Original comment by stevecoo...@gmail.com on 7 Dec 2011 at 1:38

GoogleCodeExporter commented 8 years ago
The changes suggested above by Steve 

KernelWriter.java
-
         writePragma("cl_amd_fp64", true);
+
         writePragma("cl_khr_fp64", true);

Have been made in the trunk SVN R#285

This will be left open to track the Nan stuff. 

Gary

Original comment by frost.g...@gmail.com on 20 Feb 2012 at 8:24

GoogleCodeExporter commented 8 years ago
Steve

I know you submitted this a while back.  I just took another look. 

We fixed the pragma previously but clearly there was still an issue in 
ByteBuffer.d8() as you indicated above.  I finally took some time and found the 
issue. 

See this from the fix for issue #50.

I found the culprit and submitted a patch for this #r467

It was as Steve pointed out in his diagnosis of #27 something weird in 
ByteBuffer.d8() which converts double constants from the classfile into the 
disassembler.

Actually the fix was in ByteBuffer.u8() (we pull a long in and convert it's 
bits to a double)

   double d8(int _offset) {
      return (Double.longBitsToDouble(u8(_offset)));
   }

The previous version was
   long u8(int _offset) {
       return (((long) u4(_offset)) << 32 | u4(_offset + 4));
   }

Sadly for values when the lower 32 bits had the most significant bit set this 
was sign extending and causing the top 32 bits to all be 1's.

Now we have 
   long u8(int _offset) {
      return ((u4(_offset)&0xffffffffL) << 32) |(u4(_offset + 4)&0xffffffffL);
   }

This stops the sign extension from taking place. 

This was my test case. 

public class NaNTest{
   public static void main(String[] args) {
      Kernel k = new Kernel(){
         @Override public void run() {
            double gauss = (1.0e-10);        
         }
      };
      k.execute(1024);
   }
}

Original comment by frost.g...@gmail.com on 17 May 2012 at 4:32