computablee / DotMP

A collection of powerful abstractions for parallel programming in .NET with an OpenMP-like API.
https://computablee.github.io/DotMP/
GNU Lesser General Public License v2.1
29 stars 8 forks source link

[PERFORMANCE] Optimize `Collapse(4)` and higher. #108

Closed computablee closed 10 months ago

computablee commented 10 months ago

Is your feature request related to a problem? Please describe. PR #107 drastically optimizes collapsed for loops of dimension 2 and 3. Dimensionality of 4 or higher remains unoptimized. This is an easy extreme performance gain.

Describe the solution you'd like Implement manual iteration in the ForAction.ComputeIndicesN method, similar to ComputeIndices2 and ComputeIndices3. Some benchmarks highlighting the performance gains would be excellent.

Describe alternatives you've considered N/A

Additional context Optimizing collapse(2) had well over a 2x performance boost for removing a single DivRem each iteration. For collapse(4), there are four DivRems that must be executed each iteration. Removing all of those with cheap increment and assignment operations should yield unprecedented performance improvements.