CalculateMultipoles issues one kernel per level in sequence, which enables it to run in parallel on hardware that only supports par_unseq.
On hardware that can run par in parallel, we could use a similar strategy to the Concurrent Octree with a per node counter, to fuse all these kernels into one.
CalculateMultipoles issues one kernel per level in sequence, which enables it to run in parallel on hardware that only supports
par_unseq
. On hardware that can runpar
in parallel, we could use a similar strategy to the Concurrent Octree with a per node counter, to fuse all these kernels into one.