CUDA-Tutorial / CodeSamples

Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
83 stars 24 forks source link

[08_Reductions] Computed CPU value & reduceAtomicGlobal get wrong value #1

Open knotgrass opened 1 year ago

knotgrass commented 1 year ago

When I ran this code. i realize that the sum of vector of cpu method and reduceAtomicGlobal method are difference from another method if N > 16'777'217. And of course the sum of vector is wrong. Can you help me point out why It hapen.

You can see my console log. The Expected value is not 42e9 because I fill each element in vector vals is 1.0f

Expected value: 1e+09

BFD: /lib/x86_64-linux-gnu/libutil.so.1: unknown type [0x13] section `.relr.dyn'
==== CPU Reduction ====

Computed CPU value: 1.67772e+07
==== GPU Reductions ====

       Atomic Global    1755.09ms       1.67772e+07
       Atomic Shared    1497.08ms       1e+09
       Reduce Shared    780.611ms       1e+09
      Reduce Shuffle    670.465ms       1e+09
        Reduce Final    409.837ms       1e+09
asimay commented 1 year ago

image

@Snosixtyboo ,what I met is weired. the GPU result is always 0...

cfwen commented 12 months ago

@knotgrass what happened is called underflow. Basically if you add a small float number to very large float number, result may not be accurate since float number only has 6-9 significant digits. For example: 1000000f + 0.03f equals to 1000000f in binary level, which means 0.03f is lost. In the reduce example, for CPU and atomic global cases, all numbers are added to one result number. When the result gets larger, a lot of smaller number added will be lost. (if you sort the vector, you may get more accurate result). For the other three cases, since partial results are added to final result and none of those partial results are large enough, so underflow didn't happen. If you use even larger N, underflow will eventually happen for those partial results.