Is your feature request related to a problem? Please describe.
There needs to be an analog to DotMP.Parallel.ParallelForReduction in the GPU API.
Describe the solution you'd like.
We should implement a GPU-based tree reduction, since that will run in O(log n) time. Nvidia has a good slide deck on that. This can be implemented behind-the-scenes, though we need to determine how to handle scalars on the GPU, since currently the only manner of data transfer is via the GPU.Buffer object, which only supports arrays.
Additional context.
We might need a PR first which handles scalars to/from the GPU.
Is your feature request related to a problem? Please describe.
There needs to be an analog to
DotMP.Parallel.ParallelForReduction
in the GPU API.Describe the solution you'd like.
We should implement a GPU-based tree reduction, since that will run in O(log n) time. Nvidia has a good slide deck on that. This can be implemented behind-the-scenes, though we need to determine how to handle scalars on the GPU, since currently the only manner of data transfer is via the
GPU.Buffer
object, which only supports arrays.Additional context.
We might need a PR first which handles scalars to/from the GPU.