Weird bug with nvidia GPU

yuchen-w commented 9 years ago

I've just encountered this weird bug where the output the step_world_v3 generates is mostly the same but not 100% the same as the output generated from the original step_world. This would only occur if I select the my nvidia card instead of my Intel CPU.

Can anyone else reproduce this in their code?

yuchen-w commented 9 years ago

Interestingly, I've also noticed the step_world_v3 running on my CPU is also significantly faster (>10x) than the original step_world function. Is this also supposed to happen?

m8pple commented 9 years ago

Regarding the differences, what sort of differences are they, and how big? If they are of the order of 10^-7 or so, it could be down to differences in the ordering of the floating-point instructions. A way to check is to put in test-cases with exactly representable inputs and constants (e.g. make_world 10 0.125 | step_world 0.125 100, think small binary powers), and check that the output is still exact.

Regarding the second: it is not guaranteed, but yes, hopefully the software OpenCL provider is faster than the original software. The Intel provider will hopefully be doing some SIMD optimisations, as well as using multiple threads, which could result in a 10x speed-up. It is sometimes possible that the software OpenCL provider is faster than a CPU, especially if they kernel has not been tuned for GPU friendly operation.

yuchen-w commented 9 years ago

Just checked it with

make_world 10 0.125 | step_world 0.125 100

I'm getting for the original function:

 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 0.00000000
 0.00000000 0.78210109 0.78014606 0.77879262 0.77604353 0.76561636 0.72710687 0.60089213 0.57189912 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.26333833 0.25718218 0.00000000
 0.00000000 0.00000019 0.00000149 0.00001212 0.00007913 0.00027144 0.00000000 0.09489445 0.09444368 0.00000000
 0.00000000 0.00000004 0.00000017 0.00000000 0.00027144 0.00137941 0.00738860 0.02939555 0.03399836 0.00000000
 0.00000000 0.00000001 0.00000002 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

my step_world_v5 would give:

 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 0.00000000
 0.00000000 0.77876627 0.77679271 0.77544880 0.77275264 0.76247615 0.72428840 0.59851813 0.56974041 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.26088086 0.25479814 0.00000000
 0.00000000 0.00000017 0.00000137 0.00001131 0.00007459 0.00025802 0.00000000 0.09336054 0.09290666 0.00000000
 0.00000000 0.00000003 0.00000015 0.00000000 0.00025802 0.00132408 0.00714971 0.02868212 0.03315769 0.00000000
 0.00000000 0.00000000 0.00000002 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

I seem to be off by one on the red and blue colour channels:

darioml commented 9 years ago

My mac let's meet chose between two devices. Found 2 devices Device 0 : Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz Device 1 : Iris

When I run it on device 0, my tests pass as expected. On device 1 there are more than 10^-7 units of error.

Is this similar to what you have, @yuchen-w ?

yuchen-w commented 9 years ago

@darioml Yes, that is quite similar to what I had for my step_world_v3.

Although once I've progressed past v3, the error started manifesting itself on the CPU too. Both the CPU and the GPU would return the same result though

tp1811 commented 9 years ago

I ran the same functions as @yuchen-w:

./make_world 10 0.125 | ./step_world_v5_kernel 0.125 100 | ./render_world dump2.bmp

and

 ./make_world 10 0.125 | ./step_world 0.125 100 | ./render_world dump1.bmp

The differences between the two bitmaps is shown below, some of the pixels in my blue channel are also off by one

res(:,:,1) =

    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0

res(:,:,2) =

    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0

res(:,:,3) =

    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    1    1    0
    0    0    0    0    0    0    0    1    1    0
    0    1    0    1    0    1    1    1    0    0
    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0

AugustineTan commented 9 years ago

if i am not mistaken CPUs and GPUs have different precisions. Running the functions on the CPU should give pretty good precision close to what Dr Thomas has mentioned. However i do get larger errors when running on a GPU, a GPU could be intel integrated graphics or a Nvidia device. Although the differences in values that @yuchen-w got seems too large for that many number of steps to be cause by precision errors.

HPCE / hpce-2014-cw4

Weird bug with nvidia GPU #13