Open maedoc opened 4 years ago
These will be much more difficult because the history would then have to be accessible from the generated kernels which means that they will have to have access to the memory. There is an alternative formulation of GPU-able functions that might be able to handle this more directly though.
Maybe it's not the right level of abstraction, but I have had great success on GPU for fixed step solver (EM & Heun) with lots of delays (n^2 as in the issue in the SDDE repo), because the memory access to the delay buffer can be coalesced (a la SIMD) across solutions (assuming they have identical delays) and practically hides all the arithmetic, i.e. it's memory bound.. This doesn't do anything about the other issues related to delays like discontinuity propagation, but it could be a start. I will try to come up with an example in Julia.
Yeah, that's something we can hack together in SimpleDiffEq and expose pretty quickly, and that would be fine. The adaptive higher order methods though have a lot more caching details in there though.
It would be helpful to apply GPUs for ensemble studies with the delay and stochastic delay problems.