bkloppenborg / liboi

OpenCL Interferometry Library
https://github.com/bkloppenborg/liboi/wiki
GNU Lesser General Public License v3.0
5 stars 6 forks source link

NVidia-based sum kernel produces incorrect output on CPU #42

Closed bkloppenborg closed 10 years ago

bkloppenborg commented 10 years ago

Possibly related to issue #32, the CRoutine_Sum kernel currently produces extremely incorrect values when executed on the CPU. It appears to occasionally generate incorrect sums when executed on an ATI GPU as well. The first aspect was revealed in commit ea59e3c, whereas the second aspect manifests occasionally on my AT R9 280x.

Since this will be the third sum kernel we've used (see commit ac60283a45e67c456164beb973e5fc21e655a264), we'll make the sum kernel an abstract class with a unified interface and let the user decide which kernel they should use.

bkloppenborg commented 10 years ago

The NVidia parallel sum kernel does not appear to work correctly on any CPU-based context. In d8119265 we implemented the AMD parallel reduction kernel. If a CPU context is selected, we should use this slightly less efficient kernel by default instead.