alpaka-group / vikunja

Vikunja is a performance portable algorithm library that defines functions operating on ranges of elements for a variety of purposes . It supports the execution on multi-core CPUs and various GPUs. Vikunja uses alpaka to implement platform-independent primitives such as reduce or transform.
https://vikunja.readthedocs.io/en/latest/
Mozilla Public License 2.0
14 stars 5 forks source link

Fix workdiv generation for thread-based cpu accelerators #1

Closed DerWaldschrat closed 5 years ago

DerWaldschrat commented 5 years ago

By now, the workdiv policy for thread-based cpu accelerators causes non-terminating behaviour. Of course, one could use the default 1 grid, 1 block workdiv to make it terminating, but then there is no advantage to the serial cpu accelerator.

DerWaldschrat commented 5 years ago

I created a branch (feature/fix-cpu-threads) for this that shows that all kernels reach their end, but the program never returns to the deviceReduce function. It looks like this might actually by a bug in alpaka.

DerWaldschrat commented 5 years ago

Fixed this by removing a premature return in the reduce kernel. This probably caused a deadlock due to threads waiting for sync calls. The number of sync calls in each block is now equal regardless of the kernel workload.