OpenMP provides a collapse clause for parallel-for loops as follows:
#pragma omp for collase(2)
for (int i = 0; i < X; i++) {
for (int j = 0; j < Y; j++) {
work(i, j);
}
}
This allows for two adjacently-nested for loops to both be parallelized. Without the collapse(2), the parallel scheduler would have X iterations to work with, but with collapse(2), the parallel scheduler has X*Y iterations to work with.
The way I see this working in DotMP would be something like this:
The main problem is managing the scheduler. The scheduler is already quite a mess because of reductions, so it has to keep up with multiple omp_fn delegates and decide which one to call based on a reduction flag. If a refactor is not done, this problem will only worsen. Thus, this issue asks for a lot.
Refactor the core scheduler to more elegantly handle actions with different parameters.
Implement Parallel.ForCollapse and integrate with the core scheduler elegantly.
Implement Parallel.ForReductionCollapse and integrate with the core scheduler elegantly.
Implement a Parallel.ParallelForCollapse wrapper.
Implement a Parallel.ParallelForReductionCollapse wrapper.
Edit the GEMM example to showcase this feature.
This can be done in pieces, of course. Any work towards this is helpful, though the core scheduler should absolutely be refactored before adding new features.
I would also like to see opinions on how to incorporate higher collapse orders, like a collapse(3) or a collapse(4) elegantly.
OpenMP provides a
collapse
clause for parallel-for loops as follows:This allows for two adjacently-nested for loops to both be parallelized. Without the
collapse(2)
, the parallel scheduler would haveX
iterations to work with, but withcollapse(2)
, the parallel scheduler hasX*Y
iterations to work with.The way I see this working in DotMP would be something like this:
The main problem is managing the scheduler. The scheduler is already quite a mess because of reductions, so it has to keep up with multiple
omp_fn
delegates and decide which one to call based on a reduction flag. If a refactor is not done, this problem will only worsen. Thus, this issue asks for a lot.Parallel.ForCollapse
and integrate with the core scheduler elegantly.Parallel.ForReductionCollapse
and integrate with the core scheduler elegantly.Parallel.ParallelForCollapse
wrapper.Parallel.ParallelForReductionCollapse
wrapper.This can be done in pieces, of course. Any work towards this is helpful, though the core scheduler should absolutely be refactored before adding new features.
I would also like to see opinions on how to incorporate higher collapse orders, like a
collapse(3)
or acollapse(4)
elegantly.