compdyn / partmc

Particle-resolved stochastic atmospheric aerosol model
http://lagrange.mechse.illinois.edu/partmc/
GNU General Public License v2.0
27 stars 15 forks source link

Parallelize species #135

Open cguzman95 opened 4 years ago

cguzman95 commented 4 years ago

This issue is an idea of organizing the code to improve the GPU execution (and maybe the CPU). At the moment it only works in the theory, remains pending thinking about the optimal form to apply the idea. In any case, it should be nice to think about it for the C++ 2.0 implementation.

The idea is to parallelize (apart from the reactions) also the species loop located inside the rxn_gpu_arrhenius_calc_deriv_contrib type functions. Inside these functions exists a loop that iterates over all the species present in the reaction to calculate the rate for each species. Theoretically, we can parallelize this loop on the GPU without cost (since we have almost all the threads we want available).

The problem is that it needs to restructure the data and the function will look more different. As an advantage, it can be tested first in the GPU code version.

This optimization will allow accessing the data from a higher interface level (GPU interface), since the loop will be moved from the RXN files to this interface, facilitating the data treatment. Moreover, it will help to devise more optimizations, like executing a part of code or another depending on the input data read (for example, if there are few reactions to compute, use the CPU instead of the GPU and otherwise, or at least advise the user)

mattldawson commented 4 years ago

Hi @cguzman95 - let's discuss this after the documentation is done to see how it can fit into the overall design