[Bounty $50] Inconsistent results between GPU and CPU when integers overflow.

freemo commented 7 years ago

The following code produces different results when run on the GPU vs the CPU.

import com.aparapi.*;

public class Main {
    public static void main(String[] args) {
        int num = 1;

        final long[] result = new long[num];
        final int start = Integer.MAX_VALUE;

        Kernel kernel = new Kernel() {
            @Override
            public void run() {
                final int id = getGlobalId();
                result[id] = calculate(start + id);
            }
        };
        kernel.execute(num);

        System.out.println( "expected: " +  calculate(start) + " result: " + result[0]);
    }

    public static long calculate(int tc) {
        return (long) tc * 100;
    }
}

The output from the above code snippet is:

expected: 214748364700 result: 4294967196

I tested this on my Macbook pro but others noticed the problem as well on other unspecified platforms. Also changin the calculate function such that 100 is a long rather than an integer with return (long) tc * 100l; (notice the letter l at the end of the 100) will produce the exact same incorrect results as above.

grfrost commented 7 years ago

@CC007 Glad you sorted it out. BTW gcc/mingw on windows is responsible for me having to drink many many beers ;)

On Fri, Jan 6, 2017 at 10:34 AM, CC007 notifications@github.com wrote:

@grfrost https://github.com/grfrost This is what made it run without error http://stackoverflow.com/a/6405064, so probably I don't have a 64bit libstdC++ dynamic library installed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270971132, or mute the thread https://github.com/notifications/unsubscribe-auth/AEKiN0Jcke4YXm5Rjmaf5iJ1ocL-j4ePks5rPokvgaJpZM4LYPxt .

CC007 commented 7 years ago

@grfrost Also, your kernel.zip contains Mac specific code (libdispatch and its use of blocks), so I can't run that.

freemo commented 7 years ago

FYI as of 1.3.4 the aparapi native library was compiled using Microsoft visual studios instead of GCC or minge as with previous versions.

On Jan 6, 2017 2:03 PM, "grfrost" notifications@github.com wrote:

@CC007 Glad you sorted it out. BTW gcc/mingw on windows is responsible for me having to drink many many beers ;)

On Fri, Jan 6, 2017 at 10:34 AM, CC007 notifications@github.com wrote:

@grfrost https://github.com/grfrost This is what made it run without error http://stackoverflow.com/a/6405064, so probably I don't have a 64bit libstdC++ dynamic library installed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270971132, or mute the thread https://github.com/notifications/unsubscribe-auth/ AEKiN0Jcke4YXm5Rjmaf5iJ1ocL-j4ePks5rPokvgaJpZM4LYPxt .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270978419, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5JAn_VKEikY2MmunA9zh4bKA3BAxj1ks5rPo_sgaJpZM4LYPxt .

grfrost commented 7 years ago

I think you are referring to Jeffreys code.

I was enviously looking at those blocks ;)

My code should have been straight C++

Gary

On Fri, Jan 6, 2017 at 11:10 AM, CC007 notifications@github.com wrote:

@grfrost https://github.com/grfrost Also, your kernel.zip contains Mac specific code (libdispatch and its use of blocks), so I can't run that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-270980064, or mute the thread https://github.com/notifications/unsubscribe-auth/AEKiN06VSl7R7q9Oiy0yOytdzDtp2aUyks5rPpGQgaJpZM4LYPxt .

CC007 commented 7 years ago

@grfrost ah I see now that it was @savaskoc who posted the code. Btw there is a libdispatch library for windows, but the block notation doesn't seem to be supported by mingw gcc.

TPolzer commented 7 years ago

Isn't this a clear case of undefined behaviour? As far as I know OpenCL C signed integer arithmetic overflow is only defined for atomic operations, so start + id is undefined if id is a positive int and start is Integer.MAX_VALUE.

CC007 commented 7 years ago

Addition is atomic but multiplication is indeed not atomic in the opencl specification for neither 32 nor 64 bit numbers: https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/atomicFunctions.html

TPolzer commented 7 years ago

I was referring to the https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/atomic_fetch_key.html which guarantees defined overflow for atomic_fetch_add. Aparapi is not using that function for addition, and for good reason. In general, overflow of signed integers is undefined in C, and I think OpenCL C follows C in that area.

CC007 commented 7 years ago

@TPolzer "For signed integer types, arithmetic is defined to use two’s complement representation with silent wrap-around on overflow; there are no undefined results." I don't see what you mean.

This might however explain why multiplication can cause issues

TPolzer commented 7 years ago

The problem is, that it does not apply in this situation, but only on explicitly atomic fetch and add.

nejeoui commented 7 years ago

I have noticed the same problem : Inconsistent results between GPU and CPU and between different GPUs

this kernel calculate the product of two Big Integers 256 bits each using the product scanning algorithm.

This Kernel gives different results on CPU (correct) / GPU (Intel Iris) (wrong) This Kernel gives different results on GPU (Nvidia Tesla m40)(correct) / GPU (Intel Iris)(wrong)

public void multiplyProductScanning(final byte[] a,final byte[] b,final byte[] output) {

int UV=0;
int U=0;
int V=0;

for(int k=62;k>=0;k--) {
    UV=0;
    for(int i=MAX(0,k-31);i<=MIN(k,31);i++) 
        UV+=(a[i]&0xFF)*(b[k-i]&0xFF);
    UV=UV+U;
    U=(UV&0xFFFFFF00)>>8;
    V=UV&0xFF;
    output[k+1]=(byte)V;

}
output[0]=(byte)U;

}

Notice that we canot have an overflow in UV : a[i]&0xFF < 256 (b[k-i]&0xFF <256 UV < 32 x 256 x 256

when i declare UV as long the results are correct in CPU and 4 different GPU.

freemo commented 7 years ago

I'm not entirely sure this can be solved. Aparapi works by converting java byte code into opencl's C-like language and running that on the gpu. Primitive operations like basic arithmitic is governed by the gpu and for Effiency reasons it would be infeasable to try to override that functionality in any meaningful way.

We could of course add some sort of a add, subtract, divide, and multiply method that garuntees consistency at the expense of speed. But I'm not sure we would want to address this directly on the primitive math operations for fear of loosing significant performance

On Oct 18, 2017 6:06 AM, "Nejeoui Abderrazzak" notifications@github.com wrote:

I have noticed the same problem : Inconsistent results between GPU and CPU and between different GPUs

this kernel calculate the product of two Big Integers 256 bits each using the product scanning algorithm.

This Kernel gives different results on CPU (correct) / GPU (Intel Iris) (wrong) This Kernel gives different results on GPU (Nvidia Tesla m40)(correct) / GPU (Intel Iris)(wrong)

public void multiplyProductScanning(final byte[] a,final byte[] b,final byte[] output) {

int UV=0; int U=0; int V=0;

for(int k=62;k>=0;k--) { UV=0; for(int i=MAX(0,k-31);i<=MIN(k,31);i++) UV+=(a[i]&0xFF)*(b[k-i]&0xFF); UV=UV+U; U=(UV&0xFFFFFF00)>>8; V=UV&0xFF; output[k+1]=(byte)V;

} output[0]=(byte)U;

}

Notice that we canot have an overflow in UV : a[i]&0xFF < 256 (b[k-i]&0xFF <256 UV < 32 x 256 x 256

when i declare UV as long the results are correct in CPU and 4 different GPU.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syncleus/aparapi/issues/38#issuecomment-337536250, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5JApl7NJyauQ8uXJGnhZWKB6p-tAsvks5stc2xgaJpZM4LYPxt .

Syncleus / aparapi

[Bounty $50] Inconsistent results between GPU and CPU when integers overflow. #38