SciML / JumpProcesses.jl

Build and simulate jump equations like Gillespie simulations and jump diffusions with constant and state-dependent rates and mix with differential equations and scientific machine learning (SciML)
https://docs.sciml.ai/JumpProcesses/stable/
Other
140 stars 35 forks source link

Optimization Ideas #186

Open Vilin97 opened 3 years ago

Vilin97 commented 3 years ago

Just like there is a list of SSAs to consider implementing (https://github.com/SciML/DiffEqJump.jl/issues/32), there are some optimizations of the existing SSAs to consider implementing. This issue will be the place to write them down so as not to forget. The possible optimizations so far are:

  1. Using 32-bit Floats in RSSA bracketing instead of 64-bit floats. The precision for brackets does not matter much, and using less memory will result in better performance.
  2. Using the macro @simd in the loop where propensities are being re-evaluated after executing a reaction.

Other optimization ideas are welcome.

ChrisRackauckas commented 3 years ago

Which loop?

Using 32-bit Floats in RSSA bracketing instead of 64-bit floats. The precision for brackets does not matter much, and using less memory will result in better performance.

The input is not 64-bit floats, it's type-matching to the user's input. I think in a similar sense it should be type matching, though having a multi-precision algorithm by default could be a good idea.

isaacsas commented 3 years ago

Maybe @Vilin97 means the sum in Direct? (Though on taking a look now that is a cumulative sum so I guess not applicable.)

For 1., what we discussed yesterday was that RSSA/RSSACR can get memory heavy for large systems since they store twice as many propensities and twice as many species populations (the bounding intervals). We could cut down the memory use for the former by using a coarser floating point type internally to store propensity bracketing values, and slightly expanding the bracketing interval. This should still preserve the algorithm's exactness. In spatial systems the memory use can grow pretty quick given the extreme number of jumps, so this could be an important optimization to allow that type of rejection approach there.

ChrisRackauckas commented 3 years ago

I think for some finer performance details, it would be good to post a flame chart when proposing what to optimize. Anything that is 10% or less of the calculation shouldn't get over optimized.

If you're talking about https://github.com/SciML/DiffEqJump.jl/blob/master/src/aggregators/direct.jl#L85-L88, that isn't amenable to @simd because iterates are not independent. I could see hoisting out https://github.com/SciML/DiffEqJump.jl/blob/master/src/aggregators/direct.jl#L75 on very large sets of reactions to multithread it, but I think Direct() should probably be optimized for the small reaction set case anyways since that's what it's algorithmically good for.

Vilin97 commented 3 years ago

Which loop?

Every SSA has a loop that updates rates of dependent reactions. For example, here it is for DirectCR: https://github.com/SciML/DiffEqJump.jl/blob/master/src/aggregators/directcr.jl#L130.

ChrisRackauckas commented 3 years ago

That doesn't look amenable to SIMD even if it inlines the rate functions.

isaacsas commented 3 years ago

A couple more optimizations that are worth exploring (since they do appear in parts that can be time consuming):

  1. Using VectorOfVectors from ArraysOfArrays.jl for dependency graphs instead of vectors of vectors. There are some papers in the literature that claim this gives a speedup due to the better memory layout. I think we can easily test this currently by just manually passing a dependency graph that is a VectorOfVector (it looks like we only index the outer vector, then iterate through the inner, which should work with VectorOfVector).
  2. To try speeding up the PriorityTable we could make the PriorityGroups immutable. In particular the PriorityTable sampling and updating overhead is non-trivial as @Vilin97's tests have shown, so any speedup there could be helpful.
Vilin97 commented 3 years ago

Another optimization: When all jumps are saved, can save jumps instead of the state, and reconstruct any given state from the sequence of jumps. This would improve memory usage on large systems.