computablee / DotMP

A collection of powerful abstractions for parallel programming in .NET with an OpenMP-like API.
https://computablee.github.io/DotMP/
GNU Lesser General Public License v2.1
29 stars 8 forks source link

Significant performance improvements, new scheduler #107

Closed computablee closed 10 months ago

computablee commented 10 months ago

Which issue are you addressing?

Significant performance improvements, new work-stealing scheduler.

How have you addressed the issue?

This PR implements the WorkStealingScheduler class for parallel for loops which use a work-stealing scheduler. Much of the scheduling code has undergone serious optimization, including a 40% improvement in a particular benchmark for static scheduling with chunk_size=1. Improvements were made to collapsed loops as well, incorporating division-by-multiplication. More testing is required here.

How have you tested your patch?

Unit tests have been written where necessary, and all unit tests pass.

codecov[bot] commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (e140dc3) 99.12% compared to head (ad2c447) 99.21%. Report is 7 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #107 +/- ## ========================================== + Coverage 99.12% 99.21% +0.09% ========================================== Files 12 12 Lines 1137 1271 +134 Branches 113 132 +19 ========================================== + Hits 1127 1261 +134 Misses 5 5 Partials 5 5 ``` | [Files](https://app.codecov.io/gh/computablee/DotMP/pull/107?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Phillip+Allen+Lane) | Coverage Δ | | |---|---|---| | [DotMP/Parallel.cs](https://app.codecov.io/gh/computablee/DotMP/pull/107?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Phillip+Allen+Lane#diff-RG90TVAvUGFyYWxsZWwuY3M=) | `98.97% <100.00%> (+<0.01%)` | :arrow_up: | | [DotMP/Schedule.cs](https://app.codecov.io/gh/computablee/DotMP/pull/107?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Phillip+Allen+Lane#diff-RG90TVAvU2NoZWR1bGUuY3M=) | `96.66% <100.00%> (+4.35%)` | :arrow_up: | | [DotMP/WorkShare.cs](https://app.codecov.io/gh/computablee/DotMP/pull/107?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Phillip+Allen+Lane#diff-RG90TVAvV29ya1NoYXJlLmNz) | `99.00% <100.00%> (-0.08%)` | :arrow_down: | | [DotMP/Wrappers.cs](https://app.codecov.io/gh/computablee/DotMP/pull/107?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Phillip+Allen+Lane#diff-RG90TVAvV3JhcHBlcnMuY3M=) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

computablee commented 10 months ago

After further performance testing of the updates to collapsed loops, it seems that performance may be worsened for average use cases. Avoiding merging this for now until more data can be collected.

computablee commented 10 months ago

Division-by-multiplication was removed in collapsed loops, and instead, manual iteration was implemented. This avoids any expensive operations like division, modulo, multiply, etc. The performance improvements from this are insane, well over 2x across the board for different approaches to loops.

Collapse(3) was also optimized, although remains untested. I would be very shocked if performance gains were anything less than 3x. Collapse(4) and Collapse(n) remain unoptimized, due to code complexity. There should be a writeup discussing the "yes"s and "no"s of the library as far as performance. Collapse(4) or higher is definitely a "no" for lightweight loops due to the extreme overhead of calculating indices.

Optimizing high-dimension collapsed loops shouldn't be too difficult if I get requests for it. Certainly a far easier approach than prior iterations of the collapsed chunk executor. Opening an issue and doing this later.