Open tjdodwell opened 3 years ago
Just had a brief look at this, kernel fusing should not mean any repetition of code, and speedup is significant since it stops us having to calculate bond stretch with expensive square root operation twice over (for each bond).
The code repetition is actually due to switch variables of "does model have stiffness corrections?" and "does the model have different bond types?" and combinations thereof. The possible combinations of those variables are 2^{2} = 4 repetitions of code. If we add bond stretch corrections as well as stiffness corrections then that is 2^{3} = 8 repetitions of code. This is unsustainable, but there was a point...
The point of this code repetition (in peridynamics.cl
and integrators.py
) was to conserve GPU VRAM memory when user is not using all of the features. The trade off is memory used for features (such as stiffness corrections/ multimaterial/ plastic damage models) vs memory used for number of nodes. For example if we have a simple model with no stiffness corrections and a single homogeneous bond type then we don't have to store arrays for stiffness corrections and bond types, and can run a model with a larger number of nodes total.
I think this justifies separating the cases of combinations has stiffness_corrections/ has bond_types out.
So why can't I use "if" statements to remove code repetition? Because I need to either load the huge stiffness_correction and bond_type arrays into GPU memory or not, and I couldn't find a way of generalising the code to handle all cases, so I separated them out. Thinking about it, there shouldn't be a good reason why I can't just load a small 1x1 array or null variable in its place and then use "if" statements to remove the code repetition. So yes this is worth another look.
Looks like the process of Kernel Fusing means that is a lot of repetition of code - is this easy to maintain? @bb515 What are the computational savings from kernel fusing?
If it is that significant, is there a smart way of doing, using code generation, and templating common parts (if that makes sense)