Sundials with OpenMP support

SciML / Sundials.jl

Julia interface to Sundials, including a nonlinear solver (KINSOL), ODE's (CVODE and ARKODE), and DAE's (IDA) in a SciML scientific machine learning enabled manner

https://diffeq.sciml.ai

BSD 2-Clause "Simplified" License

209 stars 77 forks source link

Sundials with OpenMP support #251

Open ViralBShah opened 4 years ago

ViralBShah commented 4 years ago

Should we build Sundials with OpenMP support? This is now working in other BB packages.

ChrisRackauckas commented 4 years ago

It's a fairly rare case for it to be useful. It's faster at 100,000 ODEs or more (according to the docs), but only speeds up the not-linear solver parts since LAPACK multithreads separately. So it would really only be useful for >100,000 non-stiff ODEs. Then to support it, we would have to support using NVectorOPENMP instead of NVectorSerial which we use everywhere right now. That's fine, but I think if we go through the library and allow other NVectors, I'd want to add something more useful, like NVector_CUDA or something like that.

ViralBShah commented 4 years ago

Ok. I think that is a clear not useful here. Also, I'd rather focus on the Julia side as you have often said.

mottelet commented 2 years ago

Hi. Just a remark about OpenMP in Sundials and its potential interest. It is not true that OpenMP can speed up only nonlinear solver part. For very large (typically a million dofs) and stiff (even purely linear) ode, when iterative linear solvers are the only way to go (and no matrix of any kind are used), then using NVector_OpenMP allows substantial performance gains. Profiling shows that in that typical case, 2/3 of the cpu time is elapsed in N_VLinearSum, N_VDotProd, N_VLinearSum, N_VWrmsNorm, N_VScale and 1/3 elapsed in the RHS computation (that can be also crunched with OpenMP). And using N_VClone instead of explicit N_Vector allocation allows to write code almost independent from the actual vector type (besides the initial "seed" vector).

ChrisRackauckas commented 2 years ago

Is there a case where you wouldn't use FBDF for that? We'll definitely accept a Pr to add the other NVector type support, but with it usually being outperformed these days (and with the other methods already allowing multithreading, GPU, etc.) I don't think it would be a priority.