SciML / SciMLBase.jl

The Base interface of the SciML ecosystem
https://docs.sciml.ai/SciMLBase/stable
MIT License
130 stars 96 forks source link

Ensemblesummary plots with (mean, standarddev) is not working correctly #266

Closed nathanaelbosch closed 1 year ago

nathanaelbosch commented 1 year ago

Minimal example

function lotka_volterra!(du, u, p, t)
    x, y = u
    α, β, δ, γ = p
    du[1] = dx = α * x - β * x * y
    du[2] = dy = -δ * y + γ * x * y
end
tspan = (0.0, 10.0) 
u0 = Gaussian(
    [1.0, 1.0],
    Matrix(0.01I, 2, 2)
)
p = [1.5, 1.0, 3.0, 1.0]
base_prob = ODEProblem(lotka_volterra!, mean(u0), tspan, p)

eprob = EnsembleProblem(
  base_prob, prob_func = (prob, i, repeat) -> remake(prob, u0 = rand(u0))
)
esol = solve(eprob, Tsit5(), trajectories = 100)

plot(esol, color=[1, 2]')

plot(EnsembleSummary(esol, tspan[1]:0.01:tspan[2]), ci_type=:SEM)

The samples look as follows: solutions But the mean and standard-deviation plot has a much too small confidence interval (too small ribbons): summary

When I manually compute the means and standard deviations from componentwise_vectors_timepoint I get a more realistic output: summary

So I currently assume that the plot recipe might be the point where things break.

Chronum94 commented 1 year ago

The first one (:SEM) is plotting 1.96 * standard error of the mean, while the second plot is of the standard error. These are two different quantities. One of them is your confidence in the mean of your distribution (assumed gaussian) from the finite sampling you have at every time step, the other one is the standard deviation of the gaussian.

nathanaelbosch commented 1 year ago

Just to clarify, the point here is that the ensemble looks like in the first plot, but the summary statistics of the second plot are not accurate (the standard deviation, shown as a ribbon, should be larger). In particular, when I compute them separately with componentwise_vectors_timepoint they seem correct; so the bug is probably in the plot recipe.

nathanaelbosch commented 1 year ago

Just revisited this issue and I got it now, somehow I missed the point and thought that the goal is to show the mean and standard deviation of the data, not of the mean estimate. Thanks for clarifying!