Flexible selection of GPU or CPU code in run_time

compdyn / partmc

Particle-resolved stochastic atmospheric aerosol model

http://lagrange.mechse.illinois.edu/partmc/

GNU General Public License v2.0

27 stars 15 forks source link

Flexible selection of GPU or CPU code in run_time #133

Closed cguzman95 closed 4 years ago

cguzman95 commented 4 years ago

The title can sound incredible, but it's very easy. This explanation is considering flag PMC_GPU=ON

Basically, when we create a new solver in GPU, we check the amount of data created. The idea is: If the amount of data is small (typical case of testing with few cells), we will compute deriv and Jac on the CPU since is faster. If not, and the size of the data is big (default case for Monarch), it will be computed on GPU.

This will ensure more or less than the program uses the optimal case, and even can be wonderful in the case of using a little mechanism and another big at the same time (like new unit_tests), since one solver with the few data will be computed on CPU and the other in GPU, even with the GPU flag ON, the program chooses the CPU option for the optimal case.

For the moment, it is implemented in a rude way in #129. I left the issue open to improving more the limit of what we consider small data or not.

cguzman95 commented 4 years ago

Known bug related with boolean type of variable:

Working on this, I set a small_data variable to 0 at the initialization. I'm not modifying anymore this value and only checking it. However, after the initialization, printing the variable in calc_deriv prints 256!

small_data:0
 CAMP-chem initialization time:   0.40044600000000002       s
small_data:256
small_data:256
small_data:256
small_data:256

It should be a bug related to mixing C++ files (.cu) and .C files, since all is working fine setting the variable as an integer instead of a bool. It should be fixed in version 2.0 (I guess), but I annotate it here to take into account.

mattldawson commented 4 years ago

Hi @cguzman95 - we could discuss this Oriol. I'm not sure that the small mechanisms are realistic scenarios. We use them for testing, but there's no real need to optimize the execution of the tests to that extent. It's better that they are representative of how real mechanisms are solved. Also, there is already a flag in CMake to ENABLE_GPU, so if a user was only going to run a small mechanism, they would just set that flag to false.

For the boolean bug, let's make that this an issue. I had a simple enum for boolean values that I was using in CAMP, but we should rethink this and be sure to use one consistent boolean throughout the code.

mattldawson commented 4 years ago

I guess it could also depend on how small you're talking about. If small is less than 10 reactions, then I don't think anyone is ever going to run one of these mechanisms, but if small is less than 100 reactions, then this might be possible.

cguzman95 commented 4 years ago

Looking at MONARCH_1 test measurements, I'm talking as a small less than a state array of length 1500 (taking into account the number of cells and n_state_var). I'm not looking at more complex tests like the cb05_big test because multi-cell CPU results are not so well (#116), and GPU and the GPU case is optimum for a small number of cells, but the limit can change in the future after improving complex mechanisms.

Anyway, we can discuss the idea with Oriol. As an advantage, the implementation is only an if-else on the GPU calls that no harm's performance. And also it will encourage the user to leave the GPU flag ON as a default.