Closed kks32 closed 4 years ago
Thank for this @kks32! I leave my comments but I don't seem to understand your table. What are the number represent (I thought it's how you divide your chunks but apparently not because of the last row)?
So if you divide into chunks of 4, you get 4-5% speedup, but did you try with 2 or 8? Any thoughts on using "dynamic" or "auto"?
There are two things on the table. 1. OpenMP performance with different chunk sizes and 2. SpinMutex implementation with a certain number of chunk size. If you let OpenMP decide, that's our baseline (742s). Dynamic will be slower. The best number of chunk-sizes and threads must be decided by the user running to code. These are only for the 3D hydrostatic column.
Describe the PR Mutex locks take a significant amount of time in OpenMP parallel versions of the code. This update is to reduce the lock by trying a spinlock and reduce the wait times across threads.
Enable setting the chunk size in OpenMP schedule using
This allows for a finer control without having to recompile for different chunk sizes when running different problems. If the variable is not set, the code will still run.
Additional context Spinlocks vs mutexes are a computational bottleneck. However, we cannot use std::atomic, so this is a good workaround the lack of std::atomic support for Eigen and Vector containers. The speed is improved by about 4 - 5 % for the 3D hydrostatic column.