SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
272 stars 27 forks source link

Callbacks don't work with EnsembleCPUArray() #316

Closed ctessum closed 7 months ago

ctessum commented 7 months ago

Hello!

I'm working with this tutorial, and I would like to use a callback. On my laptop I'm not able to use the tutorial with a GPU owing to #315 , so I'm trying it with EnsembleCPUArray() instead. This works fine, until I try to use a callback:

using DiffEqGPU, DifferentialEquations, StaticArrays

function lorenz2(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz2, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)

cb = DiscreteCallback(
    (u, t, i) -> true, 
    (i) -> (@info "callback running"),
)

sol = solve(monteprob, GPUTsit5(), EnsembleCPUArray(),
    trajectories = 10_000, callback=cb,
    saveat = 1.0f0)

In the above example, the code runs fine and gives a result but never prints "callback running", suggesting that the callback never runs. (This is an MWE of a larger problem where the callback also does not run.)

If I use EnsembleThreads() instead, the callback works fine:

sol = solve(monteprob, GPUTsit5(), EnsembleThreads(),
    trajectories = 10_000, callback=cb,
    saveat = 1.0f0)

(In this case "callback running" is repeatedly printed to stdout.)

As some context about how I ended up here, I have something like an advection-reaction system of equations, and I'm trying to do the reaction part using ModelingToolkit and the advection part using a handwritten operator. (I do not want to rewrite the ModelingToolkit part by hand, and ModelingToolkit is not yet able to handle the advection part.) I'm trying to accomplish this by using an EnsembleProblem where there is one ensemble member for each grid cell, and then occasionally interrupting the ensemble simulation with a callback to do advection between the grid cells. This requires asynchronously passing the state of each ensemble member back and forth with the advection operator, but I've got it working in cases where the number of grid cells is <= the number of Julia threads or processes. When the number of grid cells > the number of threads, everything hangs because it turns out the ensemble members don't all run at the same time when using EnsembleThreads or EnsembleDistributed. However, my understanding of EnsembleCPUArray or EnsembleGPUArray is that they would be running all of the ensemble members at the same time, so I can stop them all at a certain time to do advection between them without everything hanging. Ideally I like to do it with EnsembleCPUArray because my larger problem has parts that don't yet run on the GPU. So that's the larger goal, the specific roadblock to that is above, but I'd also be interested in suggestions of different ways to achieve the larger goal.

Thanks!

ctessum commented 7 months ago

Using the debugger, I was able to get it to work using:

sol = solve(monteprob, GPUTsit5(), EnsembleCPUArray(),
    trajectories = 10, callback=cb, merge_callbacks = true,
    saveat = 1.0f0)

I'm not sure why merge_callbacks=true is needed in this case, though.

ctessum commented 7 months ago

But also, the initializer and finalizer don't run:

cb = DiscreteCallback(
    (u, t, i) -> true, 
    (i) -> (@info "callback running"),
    initialize = (c,u,t,i) -> (@info "initializing"),
    finalize = (c,u,t,i) -> (@info "finalizing"),
)

sol = solve(monteprob, GPUTsit5(), EnsembleCPUArray(),
    trajectories = 10, callback=cb, merge_callbacks = true,
    saveat = 1.0f0)
ChrisRackauckas commented 7 months ago

That's just GPUTsit5: if you use Tsit5 it should be fine?

ctessum commented 7 months ago

Thanks, that works! Also, somehow, the initializer and affect for the callbacks work in both cases above today, even though they didn't work with the exact same code three days ago. I'm not sure how to explain that.

However, the finalizer doesn't seem to run in any of the cases above or with Tsit5, and also anecdotally I haven't been able to get the finalizer to run in most or all of the different cases I've tried recently. Maybe I don't understand the goal of the finalizer, it's supposed to run at the end of the simulation, right?

ChrisRackauckas commented 7 months ago

The finalizers just got fixed yesterday, but I haven't released that part.

ctessum commented 7 months ago

Great, thanks!