Open knotgrass opened 1 year ago
@Snosixtyboo ,what I met is weired. the GPU result is always 0...
@knotgrass what happened is called underflow. Basically if you add a small float number to very large float number, result may not be accurate since float number only has 6-9 significant digits. For example: 1000000f + 0.03f equals to 1000000f in binary level, which means 0.03f is lost. In the reduce example, for CPU and atomic global cases, all numbers are added to one result number. When the result gets larger, a lot of smaller number added will be lost. (if you sort the vector, you may get more accurate result). For the other three cases, since partial results are added to final result and none of those partial results are large enough, so underflow didn't happen. If you use even larger N, underflow will eventually happen for those partial results.
When I ran this code. i realize that the sum of vector of cpu method and reduceAtomicGlobal method are difference from another method if
N > 16'777'217
. And of course the sum of vector is wrong. Can you help me point out why It hapen.You can see my console log. The
Expected value
is not42e9
because I fill each element in vectorvals
is1.0f