Closed sprkrd closed 11 months ago
Hi @sprkrd and thanks for opening the issue and PR! This has already been suggested in #96 and I have already implemented it in v4.0.0, which will most likely be released within a week, along with many other improvements and new features (the header file itself is basically done, but I still have to write the tests and documentation for the new features).
New feature
I couldn't help but notice that the workload is not balanced equally for the push_loop method. If the number of iterations is not a multiple of the number of blocks, the last blocks receives all the excess iterations (i.e. #iterations % #blocks extra iterations). Of course, it is not possible to give every block the same number of tasks, but I believe it would be far better to give one extra iterations to the first #iterations % #blocks tasks, instead of giving potentially (#blocks-1) iterations to the last task (which, depending on how much work one iteration is, could mean that there's one thread performing a lot of extra work).
Code example
The API would remain the same, so no code example is provided.
Additional information
If there's no technical reason why this is not done this way, I volunteer to submit a pull request with the necessary changes to make this happen.