SDDP.jl example - Githubissues

slwu89 commented 1 year ago

Hi @sdwfrost, I was interested to see if I could get something similar to your InfiniteOpt.jl example running using SDDP.jl. I got this far, but the solvers keep spitting out numerical issues. I'm not familiar with the paper that your InfiniteOpt example was based off, do you see anything obviously wrong with my model here?

using SDDP, JuMP, Ipopt
using Plots
using Random
using BenchmarkTools

tmax = 40.0
tspan = (0.0,tmax);

δt = 0.1
t = 0:δt:tmax;
nsteps = Int(tmax / δt);

u0 = [990,10,0]; # S,I,R
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0; # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ R, SDDP.State, initial_value = u0[3])

    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in ≤ υ_total)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))
    @NLconstraint(sp, I.out == I.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in) - ((1-exp(-γ*δt)) * I.in))
    @NLconstraint(sp, R.out == R.in + ((1-exp(-γ*δt)) * I.in))
    @NLconstraint(sp, C.out == C.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)

sdwfrost commented 1 year ago

Hi, the only thing that is majorly different is that the initial conditions are not the same - you would have to scale beta in order to have a population size of 1000, but this wasn't the problem. I also tried putting the upper bound in the variable declaration rather than as a constraint:

@variable(sp, 0 ≤ υ_cumulative ≤ υ_total, SDDP.State, initial_value = 0)

but that didn't fix things either - I'm not sure whether this is the correct approach in SDDP.jl anyway.

There's an additional x variable in the JSON file that is written to the working directory, but I don't know whether that's a bad thing or not. Here's my stripped down version that more closely matches the InfiniteOpt.jl one:

using SDDP, JuMP, Ipopt

tmax = 40.0
δt = 0.1
nsteps = Int(tmax / δt)

u0 = [0.99,0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    #@variable(sp, 0 ≤ υ_cumulative ≤ υ_total, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in ≤ υ_total)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))
    @NLconstraint(sp, I.out == I.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in) - ((1-exp(-γ*δt)) * I.in))
    @NLconstraint(sp, C.out == C.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)

I'd ping this to the Julia discourse in the Mathematical Optimization thread, and if that doesn't work, try filing an issue in case it really is a corner case.

sdwfrost commented 1 year ago

I noticed that you had some errors in the transitions (you need to use the negative of the rate in the (1-exp(-rt)) bits, and you also included S.in in the force of infection (where it should just be I.in). I also added a warm state to the control variable. Fixes below - I still get an error though.

using SDDP, JuMP, Ipopt

tmax = 40.0
δt = 0.1
nsteps = Int(tmax / δt)

u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max, start = 0)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in ≤ υ_total)

    # expressions to simplify the state updates
    @NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
    @NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - infection)
    @NLconstraint(sp, I.out == I.in + infection - recovery)
    @NLconstraint(sp, C.out == C.in + infection)

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)

sdwfrost commented 1 year ago

The above breaks at step 202. If you reduce the simulation time to e.g. tmax=20.0 (ie 200 steps), then the model can go through an iteration loop. Any idea how you get a vector of υ? I'm not sure why there is a problem at this stage.

using SDDP, JuMP, Ipopt

tmax = 20.0
δt = 0.1
nsteps = Int(tmax / δt)

u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max, start = 0)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in ≤ υ_total)

    # expressions to simplify the state updates
    @NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
    @NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - infection)
    @NLconstraint(sp, I.out == I.in + infection - recovery)
    @NLconstraint(sp, C.out == C.in + infection)

    @stageobjective(sp, C.out)
end

SDDP.train(model; iteration_limit = 10)
rule = SDDP.DecisionRule(model; node = 1)
solution = SDDP.evaluate(rule; incoming_state=Dict(:C => 0.0))

sdwfrost commented 1 year ago

In the deterministic case, we can just use JuMP, as explained in this tutorial, which may help to debug whether it's a JuMP issue or an SDDP.jl issue (though I see where you're going with this :)).

slwu89 commented 1 year ago

Yeah that's a good idea, maybe it would be nice in any case to have a JuMP -> deterministic SDDP -> stochastic SDDP workflow anyway. I'll play around with the vanilla JuMP model and see if it sheds any light on what's going wrong with the SDDP implementation

slwu89 commented 1 year ago

@sdwfrost I didn't notice your above comment. To get the state trajectory after running your last code block (to time 20), we can do this:

using Plots

sims = SDDP.simulate(model, 1, [:S,:I])
traj = [[sims[1][i][:S].in,sims[1][i][:I].in] for i in 1:200]
traj = transpose(hcat(traj...))
plot(traj)

To give something like :

sdwfrost commented 1 year ago

Hi @slwu89! I added a deterministic JuMP.jl example here. There were a few issues, so I had to play a bit with number of steps and dt, but it now works well.

The analogous model in SDDP.jl doesn't work with the same parameter values and solver. I can run for a few iterations. For some reason, the algorithm in SDDP.jl tries a solution where υ is high at the beginning, and hence it reaches a point where it can't find a feasible solution before it's gone through all nsteps. (This is why it was failing around step 20 originally). Here's what I have so far:

using SDDP, JuMP, Ipopt, Plots

tmax = 100.0
δt = 1.0
nsteps = Int(tmax / δt)

u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost
opt=Ipopt.Optimizer

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max, start = 0)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.out ≤ υ_total)

    # expressions to simplify the state updates
    @NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
    @NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)

    # state updating rules
    @NLconstraint(sp, S.out == S.in - infection)
    @NLconstraint(sp, I.out == I.in + infection - recovery)
    @NLconstraint(sp, C.out == C.in + infection)

    @stageobjective(sp, C.out)
end

# The solver works when we convert to a JuMP model
# I don't know how to extract the values from this, however
det = SDDP.deterministic_equivalent(model, Ipopt.Optimizer)
set_silent(det)
JuMP.optimize!(det)
JuMP.termination_status(det)

# The stability report is fine
SDDP.numerical_stability_report(model)
# The solver will fail at iteration 7
SDDP.train(model; iteration_limit = 6)
sims = SDDP.simulate(model, 1, [:S,:I,:υ])
traj = [[sims[1][i][:S].in,sims[1][i][:I].in,sims[1][i][:υ]] for i in 1:nsteps]
traj = transpose(hcat(traj...))
plot(traj)

I've tried some other NLP solvers, such as MadNLP, but they don't seem to overcome the 'early lockdown' scenario.

slwu89 commented 1 year ago

@sdwfrost I found out what the problem was. The way I went about it was by reading the files that SDDP spits out when it dies (of the naming convention "subproblem_XX.mof.json") back into JuMP and then reading off the LaTeX equations to find out why the subproblem at that time point wasn't solvable.

sp_moi = JuMP.read_from_file("./subproblem_XX.mof.json")
JuMP.latex_formulation(sp_moi)

Based on this, it seems the constraint @constraint(sp, υ_cumulative.out ≤ υ_total) has to be rewritten as @constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total). The reason is that the actual value of the decision variable υ wasn't being appropriately constrained over a time step.

I've put what I have so far in my fork: https://github.com/slwu89/sir-julia/blob/slwu89/sddp/tutorials/sddp/sddp.jl

Note that currently I only have a single @stageobjective, the cumulative infected at the final time point. This seems to me to be a closer interpretation of the JuMP and InfiniteOpt objectives, which only consider the final time point. Including all intermediate timepoints as we have previously leads to worse looking "optimal" polices, although even the "optimal" one returned in this example results in C of about 0.75 after 200 iterations of training, which is a lot worse than the real optimal policy. If you have any insights, I'd be grateful.

The strange looking policy resulting from training is below:

Screenshot 2023-05-01 at 6 38 46 PM

sdwfrost commented 1 year ago

Hi @slwu89, I plugged your fix into the model, and it's nice it works. I think the problem is with the @stageobjective - even though you're minimizing the final size, before the final step, the use of 0 in the stageobjective means that essentially random policies are chosen at the beginning, until the policy budget reaches υ_total. I think that with a different problem that has a more local outcome, this approach will work better.

A 'wait and see' approach where you minimize the treatment until the final step waits too long:

if t < nsteps
        @stageobjective(sp, υ_cumulative.out)
    else
        @stageobjective(sp, C.out)
    end

I'll play around with more models to see if I can overcome the 'short sighted' nature of the current approach.

slwu89 commented 1 year ago

Good points @sdwfrost, I'll play around with the objective too. This is a rather interesting topic, in terms of how to best set up local objectives that are able to get a policy close to the global ones that JuMP and InfiniteOpt are able to see.

sdwfrost commented 1 year ago

The issue of sparse rewards comes up a lot in reinforcement learning, which causes problems as an algorithm finds it hard to work out which bits of the policy contributed to the outcome. Using a different objective is like reward shaping. I really wanted to put together an example using ReinforcementLearning.jl but got stuck trying to think about how to overcome the local/global problem.

slwu89 commented 1 year ago

Well, it's an interesting problem, it may be useful to have the example even if we aren't able to solve it (ideal objective function) here, if it is still an open research question (at least with respect to optimizing for minimum infections in SIR)! I tried penalizing the objective function by adding a "control whiplash" term to avoid rapid changes in $\nu$, either by the absolute value of the change in cumulative control, or by its square (although not sure if using the @NLconstraint violates some assumptions required by SDDP):

@variable(sp, z)

# absolute value penalty
@constraint(sp, υ_cumulative.out - υ_cumulative.in ≤ z)
@constraint(sp, -υ_cumulative.out - υ_cumulative.in ≤ z)

# quadratic penalty
@NLconstraint(sp, (υ_cumulative.out - υ_cumulative.in)^2 == z)

In general I got smoother control policies, but still far from the optimal one found by JuMP/InfiniteOpt when we can do global optimization:

sdwfrost commented 1 year ago

You can get something closer if you try to minimize a linear combination of the cumulative cost and the cumulative infections (i.e. trying to keep costs down and infections down, with a weighting factor).

@stageobjective(sp, υ_cumulative.out + 40*C.out)

A smoother penalty may not help, as the optimal includes two sharp changes.

slwu89 commented 1 year ago

Good point. I was trying out that penalty to promote smoothness to cut down on the many spiky changes in policy I found in some of the other training runs, but yes it won't be able to handle the abrupt single lockdown policy very well.

sdwfrost commented 1 year ago

Hi @slwu89 - do you want to commit a version of this model using the weighted sum as an objective? As you say, it's an interesting problem. I'm happy to work it up into a jmd if you're busy.

slwu89 commented 1 year ago

Sure! I'll get a PR ready for you in a day or two. There's a lot of development going on in SDDP.jl, I get the feeling that we'll update this specific tutorial as new features get added to the package, it's exciting stuff.

slwu89 commented 1 year ago

@sdwfrost I was hoping to get a stochastic example working for the PR but I don't think it's possible right now. In case things change in JuMP that make this possible (which the JuMP dev team is certainly thinking about, see https://github.com/jump-dev/MathOptInterface.jl/issues/846), I'm going to post where I got to.

Because SDDP does not directly support random variables parameterized by state variables, we need to be able to feed in an exogenous source of noise to each node in the policy graph that we can transform into what we need. The most straightforward way to go about this would be to simulate an Euler-Maruyama discretization of an SDE.

Each step we first draw 2 realizations of normal random variates (as https://odow.github.io/SDDP.jl/stable/guides/add_multidimensional_noise/) and then add them to the constraint matrix to multiply the nonlinear expressions for the diffusion terms by those variates (following https://odow.github.io/SDDP.jl/stable/guides/add_noise_in_the_constraint_matrix/). The issue is that JuMP.set_normalized_coefficient only works for affine and quadratic expressions; perhaps that will be changed in the future, anyways, I leave it here.

model_stochastic = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total)

    # drift terms
    @NLexpression(sp, f_infection, (1 - υ) * β * I.in * S.in)
    @NLexpression(sp, f_recovery, γ * I.in)

    # diffusion terms
    @NLexpression(sp, g_infection, sqrt(f_infection))
    @NLexpression(sp, g_recovery, sqrt(f_recovery))

    # state updating rules
    @NLconstraint(sp, S_update, S.out == S.in - 1f_infection - 1g_infection)
    @NLconstraint(sp, I_update, I.out == I.in + 1f_infection - 1f_recovery + 1g_infection - 1g_recovery)
    @NLconstraint(sp, C_update, C.out == C.in + 1f_infection + 1g_infection)

    # sample the normal random variates
    Ω1 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω2 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        # drift terms
        JuMP.set_normalized_coefficient(S_update, f_infection, δt) # if this doesn't work set f_infection to a variable and not an expression
        JuMP.set_normalized_coefficient(I_update, f_infection, δt)
        JuMP.set_normalized_coefficient(I_update, f_recovery, δt)
        JuMP.set_normalized_coefficient(C_update, f_infection, δt)
        # diffusion terms
        JuMP.set_normalized_coefficient(S_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_recovery, ω.rec)
        JuMP.set_normalized_coefficient(C_update, g_infection, ω.inf)
    end

    # linear weighting of objectives
    @stageobjective(sp, υ_cumulative.out + 40*C.out)

end

sdwfrost commented 1 year ago

Thanks Sean! Here's hoping that the JuMP team add nonlinear expressions to set_normalized_coefficient.

odow commented 1 year ago

I'm about to merge the PR in MathOptInterface, when I noticed a linked issue with "SDDP.jl" in the title that piqued my interest. This is cool stuff!

You can work around the set_normalized_coefficient issue by making f_infection etc a decision variable that is constrained to the expression. I didn't test, but something like this should work:

model_stochastic = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t

    @variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
    @variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
    @variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)

    @variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
    @variable(sp, 0 ≤ υ ≤ υ_max)

    # constraints on control    
    @constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
    @constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total)

    # drift terms
    @variable(sp, f_infection)
    @NLconstraint(sp, f_infection == (1 - υ) * β * I.in * S.in)
    @variable(sp, f_recovery)
    @NLconstraint(sp, f_recovery == γ * I.in)

    # diffusion terms
    @variable(sp, g_infection)
    @NLconstraint(sp, g_infection == sqrt(f_infection))
    @variable(sp, g_recovery)
    @NLconstraint(sp, g_recovery == sqrt(f_recovery))

    # state updating rules
    @constraint(sp, S_update, S.out == S.in - 1f_infection - 1g_infection)
    @constraint(sp, I_update, I.out == I.in + 1f_infection - 1f_recovery + 1g_infection - 1g_recovery)
    @constraint(sp, C_update, C.out == C.in + 1f_infection + 1g_infection)

    # sample the normal random variates
    Ω1 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω2 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        # drift terms
        JuMP.set_normalized_coefficient(S_update, f_infection, δt) # if this doesn't work set f_infection to a variable and not an expression
        JuMP.set_normalized_coefficient(I_update, f_infection, δt)
        JuMP.set_normalized_coefficient(I_update, f_recovery, δt)
        JuMP.set_normalized_coefficient(C_update, f_infection, δt)
        # diffusion terms
        JuMP.set_normalized_coefficient(S_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_infection, ω.inf)
        JuMP.set_normalized_coefficient(I_update, g_recovery, ω.rec)
        JuMP.set_normalized_coefficient(C_update, g_infection, ω.inf)
    end

    # linear weighting of objectives
    @stageobjective(sp, υ_cumulative.out + 40*C.out)

end

sdwfrost commented 1 year ago

Thanks @odow! Your tweak compiles, but gives an infeasible solution - @slwu89 , do you want to try to debug?

odow commented 1 year ago

but gives an infeasible solution

A core assumption of SDDP.jl is relatively complete recourse. That is, there is always a feasible solution for any choice of the incoming state variables and the random variables.

I assume that the issue is

@constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total)

Use a penalized constraint like this (for some appropriate penalty cost):

@variable(sp, penalty >= 0)
@constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total + penalty)
@stageobjective(sp, υ_cumulative.out + 40*C.out + 1_000 * penalty)

sdwfrost commented 1 year ago

Thanks - this may well be a problem down the line (and I had to play a bit with the deterministic version to overcome this constraint issue), but this is a model problem, as I get an infeasible solution on the first slice.

odow commented 1 year ago

It's potentially the non-differentiability of @NLconstraint(sp, g_infection == sqrt(f_infection)) at f_infection = 0. Ipopt uses first- and second-order derivatives during the solve.

What if you tweak the formulation to

model_stochastic = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = opt,
) do sp, t
    @variables(sp, begin
        0 <= S, SDDP.State, (initial_value = u0[1])
        0 <= I, SDDP.State, (initial_value = u0[2])
        0 <= C, SDDP.State, (initial_value = 0)
        0 <= υ_cumulative <= υ_total, SDDP.State, (initial_value = 0)
        0 <= υ <= υ_max
        ω_inf
        ω_rec
        penalty >= 0
        infection >= 0
        recovery >= 0
    end)
    @NLexpressions(sp, begin
        infection == δt * ((1 - υ) * β * I.in * S.in) + ω_inf * sqrt((1 - υ) * β * I.in * S.in)
        recovery == δt * (γ * I.in) + ω_rec * sqrt(γ * I.in)
    end)
    @constraints(sp, begin
        υ_cumulative.out == υ_cumulative.in + δt * υ - penalty
        S.out == S.in - infection
        I.out == I.in + infection - recovery
        C.out == C.in + infection
    end)
    D = Distributions.Normal(0, sqrt(δt))
    Ω1, Ω2 = sort(rand(D, 100)), sort(rand(D, 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        JuMP.fix(ω_inf, ω.inf)
        JuMP.fix(ω_rec, ω.rec)
    end
    @stageobjective(sp, υ_cumulative.out + 40 * C.out + 100 * penalty)
end

slwu89 commented 1 year ago

Thanks for jumping in here and giving us some advice @odow! I just tried the most recent model formulation you posted, after calling SDDP.train it hangs upon starting the forward/backwards iterations (quit after ~20 mins).

sdwfrost commented 1 year ago

@slwu89 - are you using the latest main of JuMP and SDDP? I can get @odow's first model running, it's just very slow. I'll report back later when it's gone through 10 iterations.

slwu89 commented 1 year ago

Hi @sdwfrost, yeah, I'm on Julia 1.9 and have updated Ipopt/JuMP/SDDP to the latest versions.

sdwfrost commented 1 year ago

This one is running for me (although at 80 min for the first iteration)

model = SDDP.LinearPolicyGraph(
    stages = nsteps,
    sense = :Min,
    lower_bound = 0,
    optimizer = Ipopt.Optimizer,
) do sp, t
    @variables(sp, begin
        0 <= S, SDDP.State, (initial_value = u0[1])
        0 <= I, SDDP.State, (initial_value = u0[2])
        0 <= C, SDDP.State, (initial_value = 0)
        0 <= υ_cumulative <= υ_total, SDDP.State, (initial_value = 0)
        0 <= υ <= υ_max
        ω_inf
        ω_rec
        penalty >= 0
        infection >= 0
        recovery >= 0
    end)
    @NLexpressions(sp, begin
        infection == δt * ((1 - υ) * β * I.in * S.in) + ω_inf * sqrt((1 - υ) * β * I.in * S.in)
        recovery == δt * (γ * I.in) + ω_rec * sqrt(γ * I.in)
    end)
    @constraints(sp, begin
        υ_cumulative.out == υ_cumulative.in + δt * υ - penalty
        S.out == S.in - infection
        I.out == I.in + infection - recovery
        C.out == C.in + infection
    end)
    D = Distributions.Normal(0, sqrt(δt))
    Ω1, Ω2 = sort(rand(D, 100)), sort(rand(D, 100))
    Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
    SDDP.parameterize(sp, Ω) do ω
        JuMP.fix(ω_inf, ω.inf)
        JuMP.fix(ω_rec, ω.rec)
    end
    @stageobjective(sp, υ_cumulative.out + 40 * C.out + 100 * penalty)
end

slwu89 commented 1 year ago

Thanks, that is where I am at too.

odow commented 1 year ago

although at 80 min for the first iteration

Now I take a closer look, the issue is that you have 100 * 100 samples for the Monte Carlo. And assuming

tmax = 100.0
δt = 1.0
nsteps = Int(tmax / δt);

then we need to solve approximately 1_000_000 nonlinear programs per iteration! This could be okay, except that JuMP currently rebuilds the model from scratch every time: https://github.com/jump-dev/JuMP.jl/issues/1185. Problems like this are one reason that we're in the process of rewriting JuMP's nonlinear support.

I'd simplify the approximation with something like

D = Distributions.Normal(0, sqrt(δt))
Ω = [(inf = rand(D), rec = rand(D)) for _ in 1:100]

slwu89 commented 1 year ago

Thanks for the helpful advice @odow, hah I see now that in the backwards pass that will generate a huge number of solves. I'll try to play around with some balance of fewer samples from the noise for each stage and time step size that gives reasonable solutions (dt=1 was too large and the approximation broke down).

epirecipes / sir-julia

SDDP.jl example #98