Analytic Parameter Jacobians

mjohnson541 commented 4 years ago

Forward sensitivity speeds are currently held back by the parameter jacobian evaluations, which can be enormously expensive to compute for our problems (lots of parameters) without sparse differentiation. For simple problems the fastest way to do this should be by adding analytic parameter jacobians. This should be easier to program than the variable jacobians as they should be more consistent between reactors.

ChrisRackauckas commented 3 years ago

You shouldn't need to calculate the Jacobian in order to compute the forward sensitivity equations. How were you doing the call?

mjohnson541 commented 3 years ago

So I appreciate that we don't need the variable jacobians unless they're very fast since we can instead calculate the VJP, but I thought we still needed to calculate the parameter jacobian (F) in https://diffeq.sciml.ai/stable/extras/sensitivity_math/? F is quite expensive because we typically have 100x more parameters than we have variables.

Currently we're being beaten pretty badly by RMG's internal sensitivity analysis code, which uses analytic Jy and Jp rather than VJP.

For forward sensitivities we currently feed in jacp! calculated by ForwardDiff to the ODEFunction only. ReverseDiff/Tracker would certainly make more sense, but I'm still working on making the internals more compatible with those.

ChrisRackauckas commented 3 years ago

You don't need the full Jacobian since jacobian-vector products are the primitive of forward-mode, so you can do a single dual number f(Dual(x,v)) for J(x)*v. In fact, you can do this with numerical differencing as well, since Jv is just the directional derivative: `(f(x+epsv)-f(x))/eps`. So forward sensitivities should never need the Jacobian, and that's a naturally sparse calculation. You don't want to use reverse-mode here since that's asymptotically worse for this case.

But the forward sensitivity implementation in DiffEqSensitivity.jl should already be doing this by default, with the absolute fastest way to do forward-mode though being to just use ForwardDiff over the integrator (demonstrated in https://arxiv.org/abs/1812.01892).

mjohnson541 commented 3 years ago

I guess I don't understand how you can formulate the parameter derivatives in the forward sensitivity equations as a jacobian vector product? Don't they have to be computed separately by auto diff or finite differences?

ChrisRackauckas commented 3 years ago

Oooh, parameter derivatives are the parameter Jacobian. My bad, I thought you were talking about the other term. Yes, that term needs a fast Jacobian, probably sparse, and analytical will be faster than sparse even in its best case, though it should be asymptotically similar but this would be a spot to really use an analytical Jacobian if you have one.

ChrisRackauckas commented 3 years ago

Though ForwardDiff over the solver makes this naturally sparse and fast, so I wouldn't use the continuous sensitivity form at all and go discrete.

mjohnson541 commented 3 years ago

I've been eyeing JVODE_BDF(), but I think currently differentiating over the solver is not possible if we stick with CVODE_BDF() and I've had trouble finding anything competitive with CVODE_BDF() on these problems.

ChrisRackauckas commented 3 years ago

TRBDF2 and KenCarp4 haven't done well?

ChrisRackauckas commented 3 years ago

I've been eyeing JVODE_BDF(), but

We have a private repo with a few things being developed, but won't be ready until December. That JVODE_BDF will likely be one of the things @yingboma gets to do a lot of time on in the next few months.

mjohnson541 commented 3 years ago

Cool! The test I ran was a couple years ago I'll run some benchmarks on TRBDF2 and KenCarp4!

ChrisRackauckas commented 3 years ago

A couple years ago was bad. We bested CVODE in https://benchmarks.sciml.ai/html/StiffODE/BCR.html only within the last year. That's what I'd point to for a 1,000 ODE performance case.

How many ODEs are you doing and how sparse?

mjohnson541 commented 3 years ago

Our systems are usually 100-3000 ODEs. The sparsity depends a lot on the size, it should look like the sparsity in this paper: https://www.sciencedirect.com/science/article/pii/S0010218001003522. Maybe ~90% sparse for large stuff.

mjohnson541 commented 3 years ago

CVODE_BDF is outperforming TRBDF2 and KenCarp4 by at least two orders of magnitude in most of the tests, 3 or 4 in some tests. I've tried with a ~30 species mechanism and an ~450 species mechanism. At low rtol I managed to get TRBDF2 within a factor of 7 for the ~450 species mechanism, but other than that I didn't have much success. I don't think BCR has any radicals in it? Maybe the timescale difference between the radical and the stable species decays makes these combustion systems stiffer than BCR?

ChrisRackauckas commented 3 years ago

Interesting. Can I get a runnable example to play with?

Maybe the timescale difference between the radical and the stable species decays makes these combustion systems stiffer than BCR?

Normally it's the other way around. BDF methods are only alpha-stable beyond order 2, which makes them take quite small steps for highly stiff problems. I suspect there might be some other salient factor involved, like the choice of lu factorization, so a runnable code will be required to figure out what's going on.

mjohnson541 commented 3 years ago

This is the smaller one I was playing with. It was also the one I saw the most difference with. I mostly just played with the ConstantTPDomain runs in the notebook. Under typical conditions setting abs_tol>1e-16 can cause problems with the radical concentrations.

ethane_solver_comparison.ipynb.zip chem26.rms.zip

ChrisRackauckas commented 3 years ago

The notebook uses a Julia v1.1 kernel. Are you using Julia v1.5?

Indeed, CVODE will scale better to very low tolerances than TRBDF and KenCarp4. Maybe try OrdinaryDiffEq's RadauIIA5 or ODEInterfaceDiffEq's radau, though the latter doesn't

Under typical conditions setting abs_tol>1e-16 can cause problems with the radical concentrations.

Describe these a bit more? Requiring a tolerance of 1e-16 is not a good idea, it'll likely introduce a bunch of numerical aberrations. If things need to stay on a manifold then using a DAE formulation or projections will be much more efficient.

mjohnson541 commented 3 years ago

Sorry, I have a couple different environments, one is v1.3 and one is v1.4.2, I haven't been good about making sure the default kernels get updated.

DAE solvers are quite common for these, Chemkin and RMG use DASPK to run these simulations, but based on my experiences working with them I believe it's slower than CVODE_BDF on these problems.

So I ran a test on an ignition condition for propyl methyl ether that should be sensitive to radical build up and perhaps even 1e-16 isn't even satisfactory:
It takes 1e-18 to get within 10% of the IDT at lower tolerances.

ChrisRackauckas commented 3 years ago

Great. @yingboma this is a usecase to make note of for DiscoDiffEq.

mjohnson541 commented 3 years ago

It looks like some people also had some success with exponential integrators: https://www.sciencedirect.com/science/article/pii/S0010218019300495#bib0038

ChrisRackauckas commented 3 years ago

I've seen exponential integrators do well on highly nonlinear problems. On semilinear problems, like PDE discretizations:

https://diffeq.sciml.ai/stable/solvers/split_ode_solve/#OrdinaryDiffEq.jl-2

they are very good. (example: https://benchmarks.sciml.ai/html/MOLPDE/allen_cahn_spectral_wpd.html). But on purely nonlinear problems?

https://diffeq.sciml.ai/stable/solvers/ode_solve/#Exponential-Runge-Kutta-Methods

I haven't seen these perform well on those problems. That said, I never tried it on extremely low tolerance, so it's all worth a try.

"Their method", i.e. phipm with happy breakdown, adaptive timestepping and adaptive Krylov subspace size, is implemented in https://github.com/SciML/ExponentialUtilities.jl and is used as the backbone for all of these methods. I won't expect much though, but might get surprised.

You might want to put together a formal benchmark a la https://benchmarks.sciml.ai/html/StiffODE/Pollution.html for this domain to get some clear answers.

YingboMa commented 3 years ago

Are there discontinuities or irregularities in the system?

mjohnson541 commented 3 years ago

It looks like this:

It isn't a real discontinuity the derivative is just locally very large and it requires very low ~1e-18 abstol.

ChrisRackauckas commented 3 years ago

What's the largest values in the simulation?

mjohnson541 commented 3 years ago

At ignition delay dPressure/dt is 1e10 (but the typical Pressure scale is 1e6) and 1e6 for dtemperature/dt (but the temperature scale is 300-1000).

ChrisRackauckas commented 3 years ago

julia> 1e10 + 1e-7
1.0e10

So the lowest absolute tolerance it's even possible to solve that for is 1e-7. I'm not sure what asking for more accuracy that floating point precision means.

mjohnson541 commented 3 years ago

Sorry, I probably shouldn't have noted the ignition part. The accuracy after the ignition isn't important although predicting when it happens accurately is very important. Regardless, the low tolerance I believe matters primarily early on in the simulation because it affects how fast the radical feedback loops form.

Omitting a lot of steps in the second path It looks kind of like this: 1) Fuel + O2 -> 2 Rad 2) Rad + Fuel -> I1 I1 + O2 -> I2 I2 + O2 -> 3 Rad

The fuel and oxygen are concentrated, Rad and I1 and I2 initially have zero concentration, the first path is very slow and second path rapidly consumes Rad as it's produced.

ChrisRackauckas commented 3 years ago

Let's make work-precision diagrams and look at the convergence with tolerance. We can settle this once and for all 😄

mjohnson541 commented 3 years ago

I tried to set that up yesterday, but first pass I could only get CVODE_BDF and TRBDF2 to both solve and complete in a realistic amount of time on my smallest realistic example:

For

abstols = 1.0 ./ 10.0 .^ (15:20);
reltols = 1.0 ./ 10.0 .^ (5:10);

Some of the ones I looked at just couldn't handle in-place evaluation which is solvable, but the others just crashed or hung for an unreasonable amount of time for the most part. I was looking at testing DAE solvers instead this morning.

ChrisRackauckas commented 3 years ago

how are you generating your reference solution? Use arbitrary precision big numbers so that there's no precision issue, and do that at 1e-21. If you're still getting those verticals, then yes it's saturated in Float64.

mjohnson541 commented 3 years ago

The Test Solution was CVODE_BDF run at abstol=1e-22 and reltol=1e-12

mjohnson541 commented 3 years ago

It looks like I need to fix some typing things to get BigFloat to work properly.

mjohnson541 commented 3 years ago

It's still vertical with BigFloat at atol 1e-21 using TRBDF2:

mjohnson541 commented 3 years ago

However this is for Constant T and P. I'll take a look at what happens for Constant Volume simulations.

ChrisRackauckas commented 3 years ago

It's still vertical with BigFloat at atol 1e-21 using TRBDF2:

CVODE cannot use big floats, so given what I see in the plot, I am assuming that your reference was computed with big floats but the solution was with Float64's? Indeed, this is what I expected: decreasing the tolerance with CVODE does not actually decrease the error in the final solution because it's at the maximal accuracy that could be computed with Float64, at least in terms of the 2-norm. Maybe you might want to check what happens for some smaller observables of the output (using save_idxs) to see if that's the case, but this is the precise numerical demonstration for why I am saying that such low absolute tolerances don't seem to make much sense.

YingboMa commented 3 years ago

When the ignition occurs, is there a fundamental change in the chemistry such that the the derivative or a high order derivative term becomes discontinuous? I really feel like often, this numerical behavior is induced by irregularities, i.e. the smoothness is not sufficient. Variable-order variable-step BDF (e.g. CVODE_BDF, lsoda) tends to be more robust in that kind of circumstances.

PS: just to be absolutely clear, when I say smoothness, I mean it mathematically, not the "it looks smooth" kind of thing. Numerical solvers are constructed to truncate Taylor expansions, so naturally, the assumption that a n-th order solver uses breaks down when, there are discontinuities in the first n derivatives, i.e. du/dt, d^2u/dt^2, d^3u/dt^3, d^4u/dt^4...

YingboMa commented 3 years ago

The fact that the work-precision plot looks erratic indicates that there's extremely high stiffness in the system. We know that TRBDF2 is L-stable, but it still solves it kind of poorly. To further diagnose the issues, I suggest to try out RadauIIA5 and plot the convergence plot. RadauIIA5 has classical order of 5 and stage-order of 3. So we would expect that the convergence of RadauIIA5 to be between 3 and 5 on this problem. If the convergence order is about 1, then it indicates that the model is ill-formed, and insufficient smoothness in the model specification is likely the cause.

mjohnson541 commented 3 years ago

Okay I think I'm starting to get to the bottom of this. First the Work-Precision diagram I did before was for a poorly selected time interval For Constant T and P:

abstols = 1.0 ./ 10.0 .^ (8:2:26);
reltols = 1.0 ./ 10.0 .^ (4:1:13);

For Constant V:

abstols = 1.0 ./ 10.0 .^ (8:24);
reltols = 1.0 ./ 10.0 .^ (4:0.5:13);

Note LSODA is unable to use the analytic jacobian. These are done with the 'superminimal' mechanism.

Second and perhaps more importantly I ran a set of test cases at different absolute tolerances:

1) Low Temperature Feedback Loop Ignition:

2) Constant Temperature and Pressure Oxidation at 10 bar:
index index

3) Constant Temperature and Pressure Oxidation at 1 bar: index index

4) Superminimal High Temperature Ignition index

1-3 use the same 450 species mechanism. reltol was set at 1e-6 for these.

My general observation is that the required tolerance for consistency is very variable, even for the same mechanism. Even similar cases seem to have appreciably different necessary tolerances. It certainly seems there are cases in the regime where solvers other than CVODE_BDF and LSODA can be tested, but this still leaves the question of how to decide what absolute tolerance to simulate at for a given problem.

ChrisRackauckas commented 3 years ago

In the work-precision diagrams, are you sure changing the absolute tolerance is doing something there? I would think that the vast majority of the changes are due to the relative tolerance changes once abstol<1e-16. It would be good to try and find a way to isolate them.

My general observation is that the required tolerance for consistency is very variable, even for the same mechanism.

That's indeed the case!

mjohnson541 commented 3 years ago

Yes, it's just due to reltol, if you keep reltol constant there's no consistent relationship between time and error for either.

mjohnson541 commented 3 years ago

I'll get the work-precision diagram for test case one, that will probably be more interesting.

YingboMa commented 3 years ago

It certainly seems there are cases in the regime where solvers other than CVODE_BDF and LSODA can be tested, but this still leaves the question of how to decide what absolute tolerance to simulate at for a given problem.

Unfortunately, there's no sliver bullet to determine tolerances. I think people often figure out the abstol and reltol by trial and error. Also, you could provide a vector valued absolute tolerance. It's helpful when you know the range of some states a priori.

BTW, I wonder if you could also compare RadauIIA5 and KenCarp4 with the BDFs.

mjohnson541 commented 3 years ago

Yeah, although the chemistry that I believe causes this activates quite early and is pretty linear it's possible I might be able to make an educated guess from the initial jacobian information. Even if that only safely gets me within six orders of magnitude that could make a big difference. In a lot of cases I do know whether a species is a radical or not and the radicals are almost certainly the species that need the high atol.

Here's superminimal atol=1e-6 -> 1e-12 and rtol = 1e-6

ChrisRackauckas commented 3 years ago

Looks like there's some fun optimization work that can be done here. Can't say I've done too much optimizing to these extreme abstols.

It's a 21 ODE system, right? How do Rodas5 and Rodas4 perform here?

mjohnson541 commented 3 years ago

This one is 13 ODEs. Originally I was trying ethane-oxidation which was about 21 ODE, it might be a better example that has more of the feedback loop stuff going, but at the high tolerances everything was taking so long I switched to this one. I'll add Rodas5 and Rodas4.

mjohnson541 commented 3 years ago

Adding in Rodas5 and Rodas4:

ChrisRackauckas commented 3 years ago

This looks to be a fairly odd case and there's likely something extra going on here.

ChrisRackauckas commented 3 years ago

How is the Jacobian generated?

mjohnson541 commented 3 years ago

They should have access to the analytic jacobian, I'm not exactly sure which are using it, but I only think the speed up provided by the analytic for CVODE_BDF is ~3x.

mjohnson541 commented 3 years ago

Work-precision for case (1):

abstols = 1.0 ./ 10.0 .^ (14:20);
reltols = ones(length(abstols))*1e-6

ChrisRackauckas commented 3 years ago

What happens if you use autodiff Jacobians? I feel like there must be a common factor for why the Julia codes are hitting all about the same timing, and the linear solver portion is probably it.

ReactionMechanismGenerator / ReactionMechanismSimulator.jl

Analytic Parameter Jacobians #87