Improve RXN vectorization: Reorder and swap loops type_reactions and n_cells

cguzman95 commented 4 years ago

The current state of chem_mod calc_deriv and calc_jac follows this loop order:

For n_cells: .... For n_rxn: Update rxn variables (rxn_env_idx, num_react...) ... end end

With the multi-cell implementation, we expect to have more cells than reactions (usually 10,000 cells in front of 200 reactions). In the reactions loop, we need to load specific rxn variables like rxn_env_idx, which can take different variables on each iteration (for example, it can be 1,1,2,1,2,... and go on, depending on the type and the order). Also, the sizes of int_data and float_data will change. This case blocks vectorization and is one of the reasons that the MONARCH_1 test runs significantly faster in front cb05.

A partial solution is to order the reaction data (doing first all the reactions of one type, then the next type and go on). However, if we have a mechanism with few reactions and some different reaction types, the fails on vectorization actually will be repeated for all the number of cells

The best solution to this, apart from order the RXN structure, is to compute first all the reactions of the same type, then the next type and go on. This means select first the reaction type and then compute all the number of cells (in other means, swap the loops n_rxn and n_cells)

This will also simplify some variables. For example, we don't need the array rxn_env_idx, since we will compute like 10,000 cell-times the same reaction using the same rxn_env_idx (correct me if I'm wrong and some reaction type can have different rxn_env_idx, or in other meanings different number of RATE_CONSTANTS). So, we only need to store a variable N_RATE_CONSTANTS for each reaction (it could be defined on each rxn file or it could be an array defined at the start in the common.h file)

This will be also in concordance with classes structure for a future C++ implementation

cguzman95 commented 4 years ago

@mattldawson What do you think?

mattldawson commented 4 years ago

@cguzman95 - I think this is a good idea for optimization, but I think there will be several incompatibilities with the current model structure that we'll have to work through before implementing something like this. One of these I think I've mentioned before is that the reason we loop on cells first is that for each cell the matrices J_rxn and J_sub_model need calculated and combined into the portion of the full Jacobian J that corresponds to that grid cell. Otherwise, J_rxn and J_sub_model will each have to be the size of the full Jacobian instead of just the size of the Jacobian for one cell.

mattldawson commented 4 years ago

But, we should talk about this and come up with a design that works and uses a structure like you're proposing if that will make things faster.

I would prefer though, if we hold of on the development until after the first model papers are done. Then we can work on implementing this optimization for your paper. We could actually think about doing this as part of the porting to c++

What do you think?

cguzman95 commented 4 years ago

What do you need for the paper version? I mention this because I have to adapt the rxn code and structures to the GPU version and I don't want to follow so much the actual structure if we will change it soon.

It's fine for you if I do only necessary changes to make GPU part works and adapt the structure after we finish the development? (I mean thinks like use the same variable names, passing the same variables in the function, and other similar things)

mattldawson commented 4 years ago

I think that would be fine - could you come up with some pseudo-code for the f() and Jac() functions (and any other functions or structures that would be affected) that describe what you're proposing?

I just want to avoid you doing a lot of work on something that will have to be changed significantly to work with all the model elements. For example, when you initially put the loops over cells in the rxn_solver.c and sub_model.c functions (for I'm guessing the same reason), in order to get it to work, I had to move these back out to the camp_solver.c functions because of the need to do rxn and sub model calculations for a single grid cell together. This was a lot of work, so I think it's important that we start with a design that includes your optimizations, but that is fully thought out as to how it works with the overall solving of the model including all of the model elements, not just a few reactions.

Another thing to remember is that optimizing a simple mechanism of just gas-phase reactions is great, but in practice the mechanism will be more complex, with some reactions taking a lot more time to calculate than others, and will include sub model calculations that will be much fewer in number but require a lot more computation than the reactions. So, as part of the design process, it might be worthwhile to take a step back and really think about realistic applications and how these will be affected by the optimizations.

mattldawson commented 4 years ago

It might be worth thinking about moving to c++ first, to make this easier. If you look at the aero_phase_solver.c aero_rep_solver.c rxn_solver.c' and sub_model.c these are all just work-arounds for not being able to call functions on a collection of objects extending some abstract class. so, if we move to c++ we can eliminate all these files and work with a much cleaner set of code that may make this design change a lot easier.

cguzman95 commented 4 years ago

I think move to C++ will complicate it, first, it would be better to make a structure similar to classes, and then pass to C++

mattldawson commented 4 years ago

ok, after an offline discussion with @cguzman95 - he will start a test branch to test some different structures for calculating the derivatives and Jacobians using GPUs. But, until we come up with a fully thought-out plan for making this work for all the model elements, not just some reactions, we won't start development on code we plan to push to chem_mod. The test branch code will probably never be merged directly into chem_mod. @cguzman95 - as long as you're ok with this, it's fine with me, but if you want your developments to make it into the final model, we need to come up with a comprehensive plan first.

cguzman95 commented 4 years ago

Yes, as we were talking, since I need to adapt the GPU structure and code, I will probably try new structures there. It will be a function like set_data_gpu that will change the structure to the best (it already exists and it makes a modification on rxn to improve only the GPU case, and also has a half-finished reordering of the reactions in rxn that could be nice too on CPU).

compdyn / partmc

Improve RXN vectorization: Reorder and swap loops type_reactions and n_cells #116