The speedup from parallelisation in the current form of 21cmFAST drops significantly after ~4 cores. It would be nice to look into possible improvements so that we can scale better with larger boxes, or with some of the slower modes of running the model.
Some improvements could include:
Scheduling: The default scheduling allocates large chunks of the box to each thread, this may result in an imbalance since regions of similar density will be allocated to the same thread. Either reducing the chunk size or using dynamic scheduling for the more intensive parallel regions may help with this.
Parallel region structure: This will be more difficult to test (and will likely require a better profiling setup) but it's possible that splitting up some of the larger parallel loops into chunks of similar computation will help the balancing.
A more minor point on how OpenMP is written is that in the previously existing parts of the code, every variable used in a parallel region is explicitly declared as shared or private in the directive, whereas my additions rely on the default scoping, where all variables are assumed shared unless declared inside the parallel region. I wrote them like this since I believe it is more readable, however I would appreciate some input on which style is preferred by others, so that we can make it uniform across the package.
The speedup from parallelisation in the current form of 21cmFAST drops significantly after ~4 cores. It would be nice to look into possible improvements so that we can scale better with larger boxes, or with some of the slower modes of running the model.
Some improvements could include:
A more minor point on how OpenMP is written is that in the previously existing parts of the code, every variable used in a parallel region is explicitly declared as shared or private in the directive, whereas my additions rely on the default scoping, where all variables are assumed shared unless declared inside the parallel region. I wrote them like this since I believe it is more readable, however I would appreciate some input on which style is preferred by others, so that we can make it uniform across the package.