Open freemo opened 7 years ago
@CC007 Glad you sorted it out. BTW gcc/mingw on windows is responsible for me having to drink many many beers ;)
On Fri, Jan 6, 2017 at 10:34 AM, CC007 notifications@github.com wrote:
@grfrost https://github.com/grfrost This is what made it run without error http://stackoverflow.com/a/6405064, so probably I don't have a 64bit libstdC++ dynamic library installed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270971132, or mute the thread https://github.com/notifications/unsubscribe-auth/AEKiN0Jcke4YXm5Rjmaf5iJ1ocL-j4ePks5rPokvgaJpZM4LYPxt .
@grfrost Also, your kernel.zip contains Mac specific code (libdispatch and its use of blocks), so I can't run that.
FYI as of 1.3.4 the aparapi native library was compiled using Microsoft visual studios instead of GCC or minge as with previous versions.
On Jan 6, 2017 2:03 PM, "grfrost" notifications@github.com wrote:
@CC007 Glad you sorted it out. BTW gcc/mingw on windows is responsible for me having to drink many many beers ;)
On Fri, Jan 6, 2017 at 10:34 AM, CC007 notifications@github.com wrote:
@grfrost https://github.com/grfrost This is what made it run without error http://stackoverflow.com/a/6405064, so probably I don't have a 64bit libstdC++ dynamic library installed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270971132, or mute the thread https://github.com/notifications/unsubscribe-auth/ AEKiN0Jcke4YXm5Rjmaf5iJ1ocL-j4ePks5rPokvgaJpZM4LYPxt .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270978419, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5JAn_VKEikY2MmunA9zh4bKA3BAxj1ks5rPo_sgaJpZM4LYPxt .
I think you are referring to Jeffreys code.
I was enviously looking at those blocks ;)
My code should have been straight C++
Gary
On Fri, Jan 6, 2017 at 11:10 AM, CC007 notifications@github.com wrote:
@grfrost https://github.com/grfrost Also, your kernel.zip contains Mac specific code (libdispatch and its use of blocks), so I can't run that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270980064, or mute the thread https://github.com/notifications/unsubscribe-auth/AEKiN06VSl7R7q9Oiy0yOytdzDtp2aUyks5rPpGQgaJpZM4LYPxt .
@grfrost ah I see now that it was @savaskoc who posted the code. Btw there is a libdispatch library for windows, but the block notation doesn't seem to be supported by mingw gcc.
Isn't this a clear case of undefined behaviour? As far as I know OpenCL C signed integer arithmetic overflow is only defined for atomic operations, so start + id
is undefined if id
is a positive int and start
is Integer.MAX_VALUE
.
Addition is atomic but multiplication is indeed not atomic in the opencl specification for neither 32 nor 64 bit numbers: https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/atomicFunctions.html
I was referring to the https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/atomic_fetch_key.html which guarantees defined overflow for atomic_fetch_add. Aparapi is not using that function for addition, and for good reason. In general, overflow of signed integers is undefined in C, and I think OpenCL C follows C in that area.
@TPolzer "For signed integer types, arithmetic is defined to use two’s complement representation with silent wrap-around on overflow; there are no undefined results." I don't see what you mean.
This might however explain why multiplication can cause issues
The problem is, that it does not apply in this situation, but only on explicitly atomic fetch and add.
I have noticed the same problem : Inconsistent results between GPU and CPU and between different GPUs
this kernel calculate the product of two Big Integers 256 bits each using the product scanning algorithm.
This Kernel gives different results on CPU (correct) / GPU (Intel Iris) (wrong) This Kernel gives different results on GPU (Nvidia Tesla m40)(correct) / GPU (Intel Iris)(wrong)
public void multiplyProductScanning(final byte[] a,final byte[] b,final byte[] output) {
int UV=0;
int U=0;
int V=0;
for(int k=62;k>=0;k--) {
UV=0;
for(int i=MAX(0,k-31);i<=MIN(k,31);i++)
UV+=(a[i]&0xFF)*(b[k-i]&0xFF);
UV=UV+U;
U=(UV&0xFFFFFF00)>>8;
V=UV&0xFF;
output[k+1]=(byte)V;
}
output[0]=(byte)U;
}
Notice that we canot have an overflow in UV : a[i]&0xFF < 256 (b[k-i]&0xFF <256 UV < 32 x 256 x 256
when i declare UV as long the results are correct in CPU and 4 different GPU.
I'm not entirely sure this can be solved. Aparapi works by converting java byte code into opencl's C-like language and running that on the gpu. Primitive operations like basic arithmitic is governed by the gpu and for Effiency reasons it would be infeasable to try to override that functionality in any meaningful way.
We could of course add some sort of a add, subtract, divide, and multiply method that garuntees consistency at the expense of speed. But I'm not sure we would want to address this directly on the primitive math operations for fear of loosing significant performance
On Oct 18, 2017 6:06 AM, "Nejeoui Abderrazzak" notifications@github.com wrote:
I have noticed the same problem : Inconsistent results between GPU and CPU and between different GPUs
this kernel calculate the product of two Big Integers 256 bits each using the product scanning algorithm.
This Kernel gives different results on CPU (correct) / GPU (Intel Iris) (wrong) This Kernel gives different results on GPU (Nvidia Tesla m40)(correct) / GPU (Intel Iris)(wrong)
public void multiplyProductScanning(final byte[] a,final byte[] b,final byte[] output) {
int UV=0; int U=0; int V=0;
for(int k=62;k>=0;k--) { UV=0; for(int i=MAX(0,k-31);i<=MIN(k,31);i++) UV+=(a[i]&0xFF)*(b[k-i]&0xFF); UV=UV+U; U=(UV&0xFFFFFF00)>>8; V=UV&0xFF; output[k+1]=(byte)V;
} output[0]=(byte)U;
}
Notice that we canot have an overflow in UV : a[i]&0xFF < 256 (b[k-i]&0xFF <256 UV < 32 x 256 x 256
when i declare UV as long the results are correct in CPU and 4 different GPU.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-337536250, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5JApl7NJyauQ8uXJGnhZWKB6p-tAsvks5stc2xgaJpZM4LYPxt .
The following code produces different results when run on the GPU vs the CPU.
The output from the above code snippet is:
expected: 214748364700 result: 4294967196
I tested this on my Macbook pro but others noticed the problem as well on other unspecified platforms. Also changin the calculate function such that 100 is a long rather than an integer with
return (long) tc * 100l;
(notice the letter l at the end of the 100) will produce the exact same incorrect results as above.