Closed visr closed 5 months ago
We already depend on TimerOutputs but currently don't really use it, but if we hook it up properly the results can be really helpful, like for https://github.com/trixi-framework/Trixi.jl.
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 595ms / 84.3% 29.3MiB / 50.9%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
I/O 3 290ms 57.8% 96.7ms 6.91MiB 46.4% 2.30MiB
~I/O~ 3 189ms 37.6% 63.0ms 948KiB 6.2% 316KiB
save solution 2 101ms 20.1% 50.4ms 5.84MiB 39.2% 2.92MiB
get element variables 2 380μs 0.1% 190μs 151KiB 1.0% 75.7KiB
save mesh 2 400ns 0.0% 200ns 0.00B 0.0% 0.00B
analyze solution 2 188ms 37.4% 93.9ms 7.86MiB 52.7% 3.93MiB
rhs! 91 24.3ms 4.8% 267μs 137KiB 0.9% 1.51KiB
reset ∂u/∂t 91 21.7ms 4.3% 239μs 0.00B 0.0% 0.00B
volume integral 91 776μs 0.2% 8.53μs 25.6KiB 0.2% 288B
~rhs!~ 91 459μs 0.1% 5.04μs 9.33KiB 0.1% 105B
interface flux 91 437μs 0.1% 4.80μs 29.9KiB 0.2% 336B
prolong2interfaces 91 366μs 0.1% 4.03μs 27.0KiB 0.2% 304B
surface integral 91 264μs 0.1% 2.90μs 24.2KiB 0.2% 272B
Jacobian 91 223μs 0.0% 2.45μs 21.3KiB 0.1% 240B
prolong2boundaries 91 10.9μs 0.0% 120ns 0.00B 0.0% 0.00B
prolong2mortars 91 10.3μs 0.0% 113ns 0.00B 0.0% 0.00B
mortar flux 91 9.50μs 0.0% 104ns 0.00B 0.0% 0.00B
source terms 91 6.50μs 0.0% 71.4ns 0.00B 0.0% 0.00B
boundary flux 91 6.20μs 0.0% 68.1ns 0.00B 0.0% 0.00B
calculate dt 19 6.60μs 0.0% 347ns 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────────
We have somewhat extensive experience with slowly running models in variable density SEAWAT simulations, since the solute transport component MT3D does adaptive timestepping. Without exception, large flows in small cells are a problem. The way we analyze those is by analyzing flow output and computing flow velocities.
The same is probably true for Ribasim: if you have output, you can color edges by computed flows. You can also compute derived data, such as average residence times (current storage / current outflow). These can be easily plotted on a map, and will likely help identifying which nodes are problematic.
Flow velocities are also appreciated output from an ecological perspective (as indicated by Ellis Penning)
Another thought: we're currently relying on DifferentialEquations.jl machinery to do the timestepping for us.
As far as I know, it compares a higher and lower order to estimate whether time step sizes have to be adjusted. Essentially, I expected it to compare du_lower
and du_higher
. If we can investigate these a bit and get access to them, we should be able to output them and interpret them spatially.
This should help us and users to identify what the problematic nodes and connections are.
I asked about this on the Julia discourse:
https://discourse.julialang.org/t/find-out-which-state-s-cause-convergence-problems/113738
Chris (the man, the myth, the legend) has responded. Something like this works (for certain algorithms):
(; cache, p) = model.integrator
(; basin) = p
if hasproperty(cache, :atmp)
storage_error = abs.(cache.atmp.storage)
perm = sortperm(storage_error, rev=true)
println("Basins in descending order of being a convergence bottleneck:")
for i in perm
node_id = basin.node_id.values[i]
error = storage_error[i]
println("$node_id, (error = $error)")
end
end
though this should probably be logged, and we should have a sensible threshold value for when the error is large enough to mention. This also relates to how these error values should be interpreted, which I don't know exactly how to do yet.
_Edit: they come from calculate_residuals
. Maybe the aforementioned threshold can simply be reltol
._
Related to #419, if a run doesn't converge or is slower than expected, we need good tools to investigate the issue.
One thing that will help is to look at the solver stats:
Additionally it would be nice to locate the nodes in the network causing the issues easily. One thing @Huite suggested was to calculate some stability criterium, like the Basin retention time or courant number.
The goal of this issue is to collect some tools, some of which could be useful for the documentation.