Closed visr closed 2 months ago
I guess it's possible that storage still ends up just barely negative somehow and one of the du
terms blows up when it sees a negative storage?
We reject any timesteps that lead to negative storage https://github.com/Deltares/Ribasim/issues/1406. Or perhaps that is too late for this, which is triggered during nonlinear iterations?
Hmm good point. It shouldn't be too late: if it converges with a negative storage, it should still be rejected.
Both the callback and isoutdomain
should be triggered. So in this case I guess there's really no negative storage involved.
Maybe some term that isn't smooth enough around S = 0?
Does any other water balance term jump?
Yeah perhaps we are stepping completely over a reduction factor here. The inflow is 29 m3/s and the outflow 62 m3/s. The outflow should go down to the inflow due to the reduction factors, but all I see is a brief minor dip in the outflow over the Outlet (50 m3/s), and no reduced flow to the UserDemand (12 m3/s).
The complete water balance at the start of the anomaly:
time | node_id | storage | level | inflow_rate | outflow_rate | storage_rate | balance_error |
---|---|---|---|---|---|---|---|
2023-11-22 | 3 | 5.3E+06 | 6.1E+01 | 2.9E+01 | 6.2E+01 | -3.3E+01 | 8.2E-14 |
2023-11-23 | 3 | 2.4E+06 | 4.1E+01 | 2.9E+01 | 6.2E+01 | -2.7E+01 | -5.9E+00 |
2023-11-24 | 3 | 9.5E+04 | 7.9E+00 | 2.9E+01 | 6.2E+01 | 1.1E+04 | -1.1E+04 |
2023-11-25 | 3 | 9.4E+08 | 8.2E+02 | 2.9E+01 | 6.2E+01 | 3.8E+04 | -3.8E+04 |
2023-11-26 | 3 | 4.2E+09 | 1.7E+03 | 2.9E+01 | 6.2E+01 | 6.5E+04 | -6.5E+04 |
The model looks like this:
And can be downloaded here:
The results with v2024.10.0 look much better. When in > out, the storage is going up from 0, when out > in, the storage goes down to 0. When storage is 0, out will never go above in, so they are the same.
A few more observations. QNDF, TRBDF2 and KenCarp4 all have similar issues. If you reduce the tolerances you get an unstable error. ImplicitEuler does seem to produce the correct results with backtracking. On the HWS model I couldn't directly eyeball large convergence issues with QNDF, though there are no empty Basins there. HWS with backtracking runs in 47s for QNDF and in 2m19 for ImplicitEuler.
Perhaps for now we should make backtracking opt-in for now while we explore this issue. Perhaps we can look at what is happening with backtracking more in detail, based on https://github.com/SciML/OrdinaryDiffEq.jl/pull/1792.
This made me realize #1413 needs a v1.0 label, because that helps to just error out on these issues to protect users.
@visr you can also try different linesearch algorithms for relaxation (https://github.com/JuliaNLSolvers/LineSearches.jl) but some more insight would be very helpful.
Also, we've probably discussed this before, but what about incorporating the water balance error in isoutofdomain
?
Yeah I can do that next week. I think the water balance error is important to address, #1413., Not sure if isoutofdomain
should be used for this (at least the name sounds wrong), though perhaps one of https://docs.sciml.ai/DiffEqCallbacks/stable/step_control/ makes sense.
Just to make sure I'm reading properly: the upper plots show timeseries that are totally wrong? It flies off course immediately, right?
And secondly: the same solver doesn't run into this problem when there's no relaxation?
In my understanding, the relaxation only limits the update of the Newton-Rhapson step. The check for convergence should be the same -- so I don't understand how it finds such a radically different solution.
@Huite you say limits the step and I thought so too, but I haven't seen anything in the linesearch code that indicates that the step cannot be bigger than the original Newton step
Doing some homework on the solvers:
The DiffEq docs mention:
For asymptotically large systems of ODEs (N>1000?) where f is very costly, and the complex eigenvalues are minimal (low oscillations), in that case QNDF or FBDF will be the most efficient. QNDF and FBDF will also do surprisingly well if the solution is smooth. However, this method can handle less stiffness than other methods and its Newton iterations may fail at low accuracy situations. Other choices to consider in this regime are CVODE_BDF and lsoda.
Curious whether CVODE_BDF or lsoda fail in a similar way.
@Huite you say limits the step and I thought so too, but I haven't seen anything in the linesearch code that indicates that the step cannot be bigger than the original Newton step
I think it's this in: https://github.com/JuliaNLSolvers/LineSearches.jl/blob/3259cd240144b96a5a3a309ea96dfb19181058b2/src/backtracking.jl#L22
α is the size of the update, default size is 1:
α_0::Tα = real(T)(1)
α = 1 is a standard Newton step.
Then it's updated:
α_0 = min(α_0, min(alphamax, ls.maxstep / norm(s, Inf)))
So α_0 shouldn't exceed 1.0 (unless you fiddle with it?).
Then in the actual function, it's doing interpolation via either a quadratic or a cubic and finding the zero for it. I think these lines mean it's limited:
α_1, α_2 = αinitial, αinitial
# interpolation
# ...
α_tmp = NaNMath.min(α_tmp, α_2*ρ_hi) # avoid too small reductions
α_2 = NaNMath.max(α_tmp, α_2*ρ_lo) # avoid too big reductions
Defined above:
ρ_hi::TF = 0.5
ρ_lo::TF = 0.1
So after the first iteration 0.1 <= α_2 <= 0.5.
Afterwards, it should only get smaller. (Unless something's changing defaults.)
To get a slightly better feeling for these methods: https://www.stochasticlifestyle.com/differences-between-methods-for-solving-stiff-odes/
TL;DR: I still think we should give this a try.
The outflow should go down to the inflow due to the reduction factors
@visr that is not how it is implemented now, but that is what is suggested here. Currently outflows of a basin don't depend on inflows (only on the storage), but to me it makes sense that that would be the case. Note that this adds new non-zeros to the Jacobian, which is much easier to play with with https://github.com/Deltares/Ribasim/pull/1606 in place (at least we can test with dense Jacobians).
Additionally, with the current implementation it is much more finnicky to find a steady state solution for depleted basins with in- and outflow (the solver has to find exactly that storage where the reduction factor makes the outflow match the inflow, in hindsight that's probably what you meant Martijn) than with the aforementioned suggestion.
I played around a bit with writing a custom line search algorithm, from which I learned that:
isoutofdomain
is triggered.α
of the step can get extremely small, in the order of 1e-19
.I also learned (with the help of claude.ai) that QNDF
does a predictor step based on previous results, which I can imagine can be very off with an abrupt change in dynamics due to the reduction factor
Currently outflows of a basin don't depend on inflows (only on the storage), but to me it makes sense that that would be the case. Note that this adds new non-zeros to the Jacobian, which is much easier to play with with https://github.com/Deltares/Ribasim/pull/1606 in place (at least we can test with dense Jacobians).
But the storage does depend on the inflows, and outflows depends on the storage. So in the (linearized) system of equations outflows clearly do depend on the inflows.
The solver is happy to work with negative storages as intermediate results in the non-linear solver
So the result of a Newton iteration might contain negative storages?
Does this mean it's calling the rhs function (water_balance!
) with negative storages?
How many of our du/dt
terms are robust for negative storage? Now that I think of it, it might be smart to have unit tests in place for each du/dt
term so they we know they accidentally explode. I reckon every potential outflow should return 0 for negative storage (?).
it is much more finnicky to find a steady state solution for depleted basins with in- and outflow (the solver has to find exactly that storage where the reduction factor makes the outflow match the inflow, in hindsight that's probably what you meant Martijn)
I don't follow why is is more finnicky? Isn't a basin always above depletion if there's still outflow? The outflow is distributed across different terms, each very small. The linearized problem provides enough information that the linear solver can find such a point, although it may result in negative storage. If the next rhs evaluation results in setting all outflows to 0, the next iteration should result in a small positive storage again.
Apparently the solver will dip into negative storage, re-formulate, and then converge with (wrong) positive storages -- otherwise the solution would be rejected. Shouldn't that go away it outflow becomes zero with negative storage then?
Does this mean it's calling the rhs function (
water_balance!
) with negative storages? How many of ourdu/dt
terms are robust for negative storage? Now that I think of it, it might be smart to have unit tests in place for eachdu/dt
term so they we know they accidentally explode. I reckon every potential outflow should return 0 for negative storage (?).
Might be relevant, docstring of the PositiveDomain callback (sorry for the garbled math statements):
Please note, that the system should be defined also outside the positive domain, since even with these approaches, negative variables might occur during the calculations. Moreover, one should follow Shampine's et al. advice and set the derivative xi′xi′ of a negative component xixi to max{0,fi(x,t)}max{0,fi(x,t)}, where tt denotes the current time point with state vector xx and fifi is the ii-th component of function ff in an ODE system x′=f(x,t)x′=f(x,t).
Good point, I remember reading this. The Shampine paper is this one:
https://www.sciencedirect.com/science/article/abs/pii/S0096300304009683
From the PDF, with non-garbled math:
My first suggestion now would be to add some unit tests to enforce du=0
for S <= 0
, and potentially add a sigmoid to sidestep resulting discontinuities.
Ha, we had implemented that advice but removed it when we started using isoutofdomain
, see https://github.com/Deltares/Ribasim/pull/639/files. I'll try adding that back and see the effect.
EDIT: For the test model I see negative storage, but not with a negative du, so it had no effect.
Uh I think setting the total du
to zero might not be the best solution. I'm not privy enough to the fancier solvers which track several previous iterations, but if you're using ImplicitEuler, it will base du
only on the last iteration. In that case, if u < 0
, then du
for those states will be set to zero as well, and every iteration afterwards will also set du=0
. Only in the next time step will it then reformulate and become dynamic again.
Instead, I think you should allow positive individual terms (inflow e.g. precipitation), but truncate individual negative terms. That way, if you end up below zero it can still go back in the positive.
EDIT: pretty sure this is what MODFLOW 6 does for the NPF (internodal flows) Newton formulation too.
Other suggestions:
just try a scalar relax value.
Evaluate residual and repeatedly apply a scalar value (both SWAP and MODFLOW 6 use this approach):
SWAP:
MODFLOW:
Fixed by #1761.
1697 increased the performance considerably, and passed the tests with only slightly higher tolerances.
Surprisingly @Fati-Mon observed this behavior on a Basin running dry:
We need to look into what is going on here.