ContinuousCallback does not work correctly with EnsembleGPUArray()

martin-abrudsky commented 1 year ago

Hello, I am solving a central potential problem with EnsembleGPUArray() for multiple initial conditions, when I add the ContinuousCallback the first 800 trajectories ignore the affect!(integrator) and the last 200 do the cut correctly. If I solve for 10_000 trajectories, the first 8000 trajectories ignore the cut and the last 2000 do it correctly, similarly for 1_000_000 trajectories. the code is the following

using CUDA
using DiffEqGPU
using NPZ
using OrdinaryDiffEq
using Plots

"""
     pot_central!(du,u,p,t)
     u=[x,dx,y,dy,# of steps]
     p=[k,m,dt]
"""
function pot_central!(du,u,p,t)
        r3 = ( u[1]^2 + u[3]^2 )^(3/2)
     du[1] = u[2]                                     # u[2]= dx
     du[2] = -( p[1]*u[1] ) / ( p[2]*r3 )   
     du[3] = u[4]                                     # u[4]= dy
     du[4] = -( p[1]*u[3] ) / ( p[2]*r3 )
     du[5] = 1/p[3]
end

T=100.0
trajectories=1_000
dt=0.01

 u_rand = convert(Array{Float64},npzread("IO_GPU/IO_u0.npy"))
       u0 = [2.0, 2.0, 1.0, 1.5,dt]
         p = [1.0,1.0,dt]              #[k,m,dt]
 tspan = (0.0,T)
      R2 = [4.5,5_000.0]           # R2=[Rmin2,Rmax2]

prob= ODEProblem{true}(pot_central!,u0,tspan,p)
prob_func= (prob,i,repeat) -> remake(prob,u0 =  [u_rand[i,:];dt].*u0 + [1.0,1.0,1.0,1.0,dt]) 
Ensemble_Problem=EnsembleProblem(prob,prob_func=prob_func,safetycopy=false) 

function condition(u,t,integrator)
        r2 = u[1]^2 + u[3]^2
        (R2[2] - r2)*(r2 - R2[1])
end

affect!(integrator) = terminate!(integrator)
gpu_cb = ContinuousCallback(condition, affect!;save_positions=(true,false),rootfind=true,interp_points=0,abstol=1e-7,reltol=0)

sol= solve(Ensemble_Problem,
                 Tsit5(),
                 EnsembleGPUArray(),
                 dt=0.01,
                 trajectories=trajectories,
                 #batch_size=10_000,
                 callback = gpu_cb,
                 adaptive=false,
                 save_everystep=false
                 )
`

The solutions for some trajectories are as follows

`
#u=[x,dx,y,dy,# of steps]

sol[1]

retcode: Success
Interpolation: 1st order linear
t: 2-element Vector{Float64}:
   0.0
 100.0
u: 2-element Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}:
 [1.6916643844888346, 1.3952481788899702, 1.4257929742644806, 2.3254414019025376, 0.0101]
 [130.3632437338083, 1.283605087637319, 221.2362706355311, 2.193291036113921, 10000.010099999996]

sol[800]

retcode: Success
Interpolation: 1st order linear
t: 2-element Vector{Float64}:
   0.0
 100.0
u: 2-element Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}:
 [2.3286232359143177, 2.0849960853408054, 1.5481395680210261, 1.0878883702422468, 0.0101]
 [198.03489179352627, 1.9510332210035648, 102.76307791913189, 1.008911077728196, 10000.010099999996]

sol[801]

retcode: Terminated
Interpolation: 1st order linear
t: 2-element Vector{Float64}:
  0.0
 28.22912660730258
u: 2-element Vector{Vector{Float64}}:
 [1.3698092340031855, 2.018364836988968, 1.9808709009626209, 1.5912143969170325, 0.0101]
 [55.503066919039185, 1.9067772301987758, 43.81106666791687, 1.4723417477028182, 2822.922760730258]

sol[802]

retcode: Terminated
Interpolation: 1st order linear
t: 2-element Vector{Float64}:
  0.0
 30.891877286483595
u: 2-element Vector{Vector{Float64}}:
 [2.2032317939655903, 1.2169273303092267, 1.6948012447348866, 2.0018032820652945, 0.0101]
 [36.85704302055922, 1.1131346681788845, 60.34532608065554, 1.8862192286561592, 3089.1978286483586]

From already thank you very much

utkarsh530 commented 1 year ago

Can you try the following:

sol= solve(Ensemble_Problem,
                 Tsit5(),
                 EnsembleGPUArray(),
                 dt=0.01,
                 trajectories=trajectories,
                 #batch_size=10_000,
                 callback = gpu_cb,
                 adaptive=false,
                 save_everystep=false,
                 merge_callbacks = true
                 )

martin-abrudsky commented 1 year ago

Hello, I tried your recommendation but it gives me this error

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?79e64a19-8dad-4456-bf40-7452538e8508)
InvalidIRError: compiling kernel #gpu_continuous_condition_kernel(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(condition), CuDeviceVector{Float64, 1}, CuDeviceMatrix{Float64, 1}, Float64, CuDeviceMatrix{Float64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to convert)
Stacktrace:
 [1] setindex!
   @ ~/.julia/packages/CUDA/DfvRa/src/device/array.jl:194
 [2] macro expansion
   @ ~/.julia/packages/DiffEqGPU/CiiCq/src/DiffEqGPU.jl:63
 [3] gpu_continuous_condition_kernel
   @ ~/.julia/packages/KernelAbstractions/C8flJ/src/macros.jl:81
 [4] gpu_continuous_condition_kernel
   @ ./none:0
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] condition
   @ ~/FAMAF/Beca_CIN_Trabajo_Final/skymap/GPU_Julia/pot_central_GPU_Float64.ipynb:21
 [2] macro expansion
   @ ~/.julia/packages/DiffEqGPU/CiiCq/src/DiffEqGPU.jl:63
 [3] gpu_continuous_condition_kernel
   @ ~/.julia/packages/KernelAbstractions/C8flJ/src/macros.jl:81
 [4] gpu_continuous_condition_kernel
   @ ./none:0
Reason: unsupported dynamic function invocation (call to -)
Stacktrace:
 [1] condition
   @ ~/FAMAF/Beca_CIN_Trabajo_Final/skymap/GPU_Julia/pot_central_GPU_Float64.ipynb:21
...
    @ ~/.julia/packages/CUDA/DfvRa/src/utilities.jl:25 [inlined]
 [33] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/pool.jl:490 [inlined]
 [34] top-level scope
    @ ~/FAMAF/Beca_CIN_Trabajo_Final/skymap/GPU_Julia/pot_central_GPU_Float64.ipynb:0

utkarsh530 commented 1 year ago

I think that issue arises because you're using global variables inside condition, which doesn't make it the condition GPU-compatible. The support of callbacks is limited. Hence, use constants directly within the condition.
However, I saw that your affect! is a teminate! which is not supported yet #43. We're working on it.

cc @ChrisRackauckas

martin-abrudsky commented 1 year ago

ok. terminate! is also not supported by EnsembleGPUkernel(). I ask because I tried to solve this problem using this Ensemble, for which I first tried the example Callbacks with EnsembleGPUKernel found in the readme, however I got the following error

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?7ea01c52-ff07-4982-8627-c8c01801da66)
GPU compilation of kernel #tsit5_kernel(CUDA.CuDeviceVector{ODEProblem{SVector{1, Float32}, Tuple{Float32, Float32}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(f), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, CUDA.CuDeviceMatrix{SVector{1, Float32}, 1}, CUDA.CuDeviceMatrix{Float32, 1}, Float32, CallbackSet{Tuple{}, Tuple{DiscreteCallback{typeof(condition), typeof(affect!), typeof(SciMLBase.INITIALIZE_DEFAULT), typeof(SciMLBase.FINALIZE_DEFAULT)}}}, CUDA.CuDeviceVector{Float32, 1}, Int64, Nothing, Val{true}) failed
KernelError: passing and using non-bitstype argument

Argument 6 to your kernel function is of type CallbackSet{Tuple{}, Tuple{DiscreteCallback{typeof(condition), typeof(affect!), typeof(SciMLBase.INITIALIZE_DEFAULT), typeof(SciMLBase.FINALIZE_DEFAULT)}}}, which is not isbits:
  .discrete_callbacks is of type Tuple{DiscreteCallback{typeof(condition), typeof(affect!), typeof(SciMLBase.INITIALIZE_DEFAULT), typeof(SciMLBase.FINALIZE_DEFAULT)}} which is not isbits.
    .1 is of type DiscreteCallback{typeof(condition), typeof(affect!), typeof(SciMLBase.INITIALIZE_DEFAULT), typeof(SciMLBase.FINALIZE_DEFAULT)} which is not isbits.
      .save_positions is of type BitVector which is not isbits.
        .chunks is of type Vector{UInt64} which is not isbits.

I get the same error in my code

ChrisRackauckas commented 1 year ago

With EnsembleGPUArray, that's a won't fix. We should just throw an error for this, and tell people it's required that they use the kernel generating methods.

utkarsh530 commented 1 year ago

Yes, I'll make a PR on it. A lot of stuff needs to be on docs as well.

utkarsh530 commented 1 year ago

With the new docs released and terminate! support in EnsembleGPUKernel, can you try this again?

ChrisRackauckas commented 1 year ago

It's now documented and all, so closing. EnsembleGPUKernel is the answer, as per the docs.

SciML / DiffEqGPU.jl

ContinuousCallback does not work correctly with EnsembleGPUArray() #191