google-code-export / nativelibs4java

Automatically exported from code.google.com/p/nativelibs4java
1 stars 1 forks source link

JavaCL Endianness Problems with MacBookPro #80

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
First, have you checked out the FAQ
(http://code.google.com/p/nativelibs4java/wiki/FAQ) and Build instructions
(http://code.google.com/p/javacl/wiki/Build) ?

Have you looked at NativeLibs4Java's user group's archive ?
(http://groups.google.com/group/nativelibs4java)

What steps will reproduce the problem?
1. run javacl on a mac book pro with ATi Radeon HD 6700M
2. use functions such as setArg(int, float[4])
3. more details copy/pasted below

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

If the JVM crashes, please attach the  hs_err_pidXXX.log crash report file
written by the JVM.

For JavaCL / OpenCL4Java / ScalaCL issues, if the HardwareReport runs fine
(
http://nativelibs4java.sourceforge.net/webstart/OpenCL/HardwareReport.jnlp)
, please attach its HardwareReport.html output file to this issue.

Please provide any additional information below.

Hi (Olivier),
Thank you for your input regarding my previous problem with the wait-
for arrays.  I have successfully switched to a newer version of JavaCL
RC1 (JNA Version) now, and the waitFor() is working great.

However, I'm have face a few more problems during my development.  It
seems that, while doing some tests on the latest Macbook Pro, which
has the graphics card ATi Radeon HG 6700M, everything that had worked
before on every other computer, has broken completely!

after a bunch of testing, here's what I've come up with:

context.getByteOrder() returns ByteOrder.BIG_ENDIAN for this card,
while it did not on all the other cards I've tried.

The thing is, however, I tried some unit tests as follows:
       http://pastebin.com/VmjfW4TB

While tweaking and switching around the byte ordering of outJava,
inJava, and inIntJava, we found that the card recognizes little endian
instead.

this caused us quite a headache, as some features such as setArgs(int,
float[]) were assuming big endian and passing in confusing values to
the graphics card.

With all those fixed, we also encountered some even stranger problems
(that we believe might be linked to the issue above?)

Consider the following unit test:
       http://pastebin.com/bwQAZWvH

on all the tested gpu's, we obtain the output

The output is
125.0, 250.0, 375.0, 500.0, 125.0, 12500.0,
The xx used for multiplication was
125.0, 125.0, 125.0, 125.0,
The constant (a) used for multiplication was
1.0, 2.0, 3.0, 4.0,

While on the macbookpro, we obtained

The output is
0.0, 0.0, 0.0, 0.0, 125.0, 12500.0,
The xx used for multiplication was
125.0, 125.0, 125.0, 125.0,
The constant (a) used for multiplication was
1.0, 2.0, 3.0, 4.0,

For some reason it was zeroing out the multiplication! (we have also
tested that the multiplication works find with 2 constant float4's
rather than the global_id.)

We investigated further and wrote the same test in c and ran it on the
same computer: (test modified from apple opencl "hello world"
application)
C code:
       http://pastebin.com/XQ0GJvzZ

The output were correct this time!

output: (125.00,250.00,375.00,500.00)
globalFloat: (125.00,125.00,125.00,125.00)
constants: (1.00,2.00,3.00,4.00)

We're not really sure what is going on here, but we speculate that
there is something strange going on in the transition from C to
JavaCL...

Original issue reported on code.google.com by haku...@gmail.com on 22 Jul 2011 at 2:21

GoogleCodeExporter commented 9 years ago
here is the hardware report:

[ATI Radeon HD 6750M]   [Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz]         
CL_DEVICE_ADDRESS_BITS  32  64
CL_DEVICE_AVAILABLE true    true
CL_DEVICE_COMPILER_AVAILABLE    true    true
CL_DEVICE_ENDIAN_LITTLE false   true
CL_DEVICE_ERROR_CORRECTION_SUPPORT  false   false
CL_DEVICE_EXECUTION_CAPABILITIES    Kernel  Kernel
NativeKernel
CL_DEVICE_EXTENSIONS    cl_APPLE_gl_sharing cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0   64
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0   6291456
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE None    ReadWriteCache
CL_DEVICE_GLOBAL_MEM_SIZE   536870912   6442450944
CL_DEVICE_HOST_UNIFIED_MEMORY   false   false
CL_DEVICE_IMAGE2D_MAX_HEIGHT    8192    8192
CL_DEVICE_IMAGE2D_MAX_WIDTH 8192    8192
CL_DEVICE_IMAGE3D_MAX_DEPTH 0   2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT    0   2048
CL_DEVICE_IMAGE3D_MAX_WIDTH 0   2048
CL_DEVICE_IMAGE_SUPPORT false   true
CL_DEVICE_LOCAL_MEM_SIZE    32768   16384
CL_DEVICE_LOCAL_MEM_TYPE    Local   Global
CL_DEVICE_MAX_CLOCK_FREQUENCY   150 2200
CL_DEVICE_MAX_COMPUTE_UNITS 5   8
CL_DEVICE_MAX_CONSTANT_ARGS 8   8
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE  65536   65536
CL_DEVICE_MAX_MEM_ALLOC_SIZE    134217728   1610612736
CL_DEVICE_MAX_PARAMETER_SIZE    1024    4096
CL_DEVICE_MAX_READ_IMAGE_ARGS   0   128
CL_DEVICE_MAX_SAMPLERS  128 16
CL_DEVICE_MAX_WORK_GROUP_SIZE   1024    1
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS  3   3
CL_DEVICE_MAX_WORK_ITEM_SIZES   1024
1024
1024    1
1
1
CL_DEVICE_MAX_WRITE_IMAGE_ARGS  0   8
CL_DEVICE_MEM_BASE_ADDR_ALIGN   32768   1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE  128 128
CL_DEVICE_NAME  ATI Radeon HD 6750M Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR  n/a n/a
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE    n/a n/a
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT n/a n/a
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT   n/a n/a
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG  n/a n/a
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT n/a n/a
CL_DEVICE_OPENCL_C_VERSION  OpenCL C 1.0    OpenCL C 1.0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR   16  16
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE             0   2
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT  4   4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT    4   4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG   2   2
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT  8   8
CL_DEVICE_PROFILE   FULL_PROFILE    FULL_PROFILE
CL_DEVICE_PROFILING_TIMER_RESOLUTION    37  1
CL_DEVICE_QUEUE_PROPERTIES  ProfilingEnable ProfilingEnable
CL_DEVICE_SINGLE_FP_CONFIG  InfNaN
RoundToNearest  Denorm
InfNaN
RoundToNearest
CL_DEVICE_TYPE  GPU CPU
CL_DEVICE_VENDOR    AMD Intel
CL_DEVICE_VENDOR_ID 16915200    16909312
CL_DEVICE_VERSION   OpenCL 1.0  OpenCL 1.0
CL_DRIVER_VERSION   1.0 1.0
CL_PLATFORM_EXTENSIONS      
CL_PLATFORM_NAME    Apple   Apple
CL_PLATFORM_PROFILE FULL_PROFILE    FULL_PROFILE
CL_PLATFORM_VENDOR  Apple   Apple
CL_PLATFORM_VERSION OpenCL 1.0 (Dec 26 2010 12:52:21)           OpenCL 1.0 (Dec 
26 2010 12:52:21)
Out of order queues support false   false
cl_khr_byte_addressable_store   false   false
cl_khr_gl_sharing   false   false
cl_nv_compiler_options  false   false
cl_nv_device_attribute_query    false   false

Original comment by haku...@gmail.com on 22 Jul 2011 at 2:32

GoogleCodeExporter commented 9 years ago

Original comment by olivier.chafik@gmail.com on 25 Jul 2011 at 8:10

GoogleCodeExporter commented 9 years ago
Hello Paul,

Thanks for your detailed report :-)

The issue appears to come from CLDevice.getKernelsDefaultByteOrder(), which 
didn't take care of the device's endianness (some old hack did, but for some 
reason I commented it out at some point...).

I've committed a change that might fix the issue (revision #2227) and uploaded 
a new 1.0-SNAPSHOT for both the JNA and BridJ versions of JavaCL (the latter is 
still being uploaded as I write this, but the JNA version is already available 
:-)).
It would be great if you could test it and let me know how it goes as I don't 
have access to any ATI-powered computer right now (I'm on extended 
vacations...).

Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 25 Jul 2011 at 9:10

GoogleCodeExporter commented 9 years ago
Precision : the 1.0-SNAPSHOT version is available through Maven or here (look 
for the "-shaded"-suffixed jar) : 
http://nativelibs4java.sourceforge.net/maven/com/nativelibs4java/javacl/1.0-SNAP
SHOT/

Original comment by olivier.chafik@gmail.com on 25 Jul 2011 at 9:21

GoogleCodeExporter commented 9 years ago
Hi Olivier,

When i updated the javacl to the file in the following link:
http://nativelibs4java.sourceforge.net/maven/com/nativelibs4java/javacl-jna/1.0-
SNAPSHOT/
(i had to go looking for the jna version, as that's what we were using and it's 
a big project with dependencies in some old asm.jar which Bridj was angry about)

It seems that after trying to get it to work on 2 computers, my compiler just 
does not want to open the archive file...

Sorry for the troubles again,
Paul

Original comment by haku...@gmail.com on 26 Jul 2011 at 7:42

GoogleCodeExporter commented 9 years ago
Hi Paul,
Sorry for the delay in re-uploading the fix (I'm on lengthy vacations...).
The JAR was indeed apparently corrupted (so much for on-the-go deployments from 
cybercafés :-S), I've uploaded it again.

In any case, please note that it's relatively straightforward to build the 
latest SVN version from sources : http://code.google.com/p/javacl/wiki/Build

(the files you're interested in will be in libraries/OpenCL-JNA/JavaCL/target 
after the build completes)
Please let me know if you face other issues...

Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 2 Aug 2011 at 4:28

GoogleCodeExporter commented 9 years ago
Hi Olivier, 
I went and ran the same tests on the same computer as the original report,
when I ran the test from
http://pastebin.com/VmjfW4TB
with a minor change of all the ByteOrder.LITTLE_ENDIAN replaced with 
OpenCLSingletonState.getContext().getByteOrder() since it is more appropriate.  
the output I get is as follows:

Using ATI Radeon HD 6750M
The max memory of this device is: 134217728
x should be 1 and is 4
y should be 2 and is 3
z should be 3 and is 2
w should be 4 and is 1
return value for if integer was 1 0

the results for the second test from
http://pastebin.com/bwQAZWvH
(note in this case i'm not using NIOUtils, and using standard javacl's 
defaults...nor is there any input...actually)

Using ATI Radeon HD 6750M
The max memory of this device is: 134217728
The output is
0.0, 0.0, 0.0, 0.0, 8.9776E-41, 7.370973E-39, 
The xx used for multiplication was
8.9776E-41, 8.9776E-41, 8.9776E-41, 8.9776E-41, 
The constant (a) used for multiplication was
4.6006E-41, 9.0E-44, 2.3049E-41, 4.6007E-41, 

If there's any more information you'd like please feel free to ask.

Original comment by haku...@gmail.com on 4 Aug 2011 at 3:33

GoogleCodeExporter commented 9 years ago
So, I'm not sure what's changed since Aug 4 (I don't think we've updated the 
javacl jar since then), but this problem appears to be resolved...:

[System.out] - The output is
[System.out] - 125.0, 250.0, 375.0, 500.0, 125.0, 12500.0,
[System.out] - The xx used for multiplication was
[System.out] - 125.0, 125.0, 125.0, 125.0,
[System.out] - The constant (a) used for multiplication was
[System.out] - 1.0, 2.0, 3.0, 4.0,

Additionally, our real code is now running fine.  So, I guess this appears to 
be resolved?

(I work with hakuliu).

Original comment by yode...@gmail.com on 31 Aug 2011 at 4:11

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi Paul,

This is excellent news, thanks for the feedback !
I didn't deploy anything new since August 2nd, but maybe Maven got confused 
with the corrupted jar somehow and didn't update the way it should have...

Please let me know if you run into this issue again on other platforms : I'll 
reopen this ticket (and of course, feel free to open tickets for other issues 
!).

Cheers
--
zOlive
(edited message: I couldn't see hakuliu's name in the report, with those 
annoying abridged emails :-))

Original comment by olivier.chafik@gmail.com on 31 Aug 2011 at 11:15