Does DiffEqGPU support adaptive methods?

TheStarAlight commented 11 months ago

Hi, I'm using DiffEqGPU for my julia package to perform ensemble simulations (using EnsembleGPUKernel and GPUTsit5), now the non-adaptive methods (adaptive=false, dt=...) works well.

However, when I try to use adaptive methods (adaptive=true,reltol=...) the error occured...

I'm using the code from here.

And here I enclose the code below which can run in REPL:

using StaticArrays
using OrdinaryDiffEq
using DiffEqGPU, CUDA

# l = Lasers.Cos2Laser(2e14,800,2,1.0)
traj = function (u,p,t)
    F0 = 0.05338027007325633 # Lasers.LaserF0(l)
    ω = 0.05695419065625 # Lasers.AngFreq(l)
    N = 2   # l.cycNum
    φ = 0.0 # l.cep
    ε = 1.0 # l.ellip
    tFx, tFy, tFz = -1*(u[1]^2+u[2]^2+u[3]^2+1.0)^(-1.5) .* (u[1],u[2],u[3])
    du1 = u[4]
    du2 = u[5]
    du3 = u[6]
    du4 = tFx- F0 * cos(ω*t/(2N)) * (abs(ω*real(t))<N*π) * ( cos(ω*t/(2N))*sin(ω*t+φ) + 1/N*sin(ω*t/(2N))*cos(ω*t+φ))
    du5 = tFy- F0 * cos(ω*t/(2N)) * (abs(ω*real(t))<N*π) * ( cos(ω*t/(2N))*cos(ω*t+φ) - 1/N*sin(ω*t/(2N))*sin(ω*t+φ)) * ε
    du6 = tFz
    @SVector [du1,du2,du3,du4,du5,du6]
end
traj_t_final = 120.
traj_dt = 0.001
init = [10.0  9.0  8.0  7.0;    # x
        5.0  1.0  0.0  0.0;    # y
        0.0  0.0  6.0  1.0;    # z
        0.0  1.0  0.0  1.0;    # px
        1.0  1.0 -1.0  2.0;    # py
        0.0  0.0  0.0  3.0;    # pz
        0.0 -5.0  0.0  0.0;]
traj_ODE_prob::ODEProblem = ODEProblem(traj, (@SVector zeros(Float64,6)), (0,traj_t_final))
init_traj = (prob,i,repeat) -> remake(prob; u0=SVector{6}([init[k,i] for k in 1:6]), tspan=(init[7,i],traj_t_final))
ensemble_prob::EnsembleProblem = EnsembleProblem(traj_ODE_prob, prob_func=init_traj, safetycopy=false)
solc = nothing
solg = nothing
# CPU non-adaptive
solc = solve(ensemble_prob, OrdinaryDiffEq.Tsit5(), EnsembleThreads(), trajectories=size(init,2), adaptive=false, dt=traj_dt, save_everystep=false)
# GPU non-adaptive
solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=false, dt=traj_dt, save_everystep=false)
# CPU adaptive
solc = solve(ensemble_prob, OrdinaryDiffEq.Tsit5(), EnsembleThreads(), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false)
# GPU adaptive
solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false)

when running the last line the error happens :( Here's the stack trace:

julia> solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false)
ERROR: InvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_ode_asolve_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::CuDeviceVector{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, ::GPUTsit5, ::CuDeviceMatrix{SVector{6, Float64}, 1}, ::CuDeviceMatrix{Float32, 1}, ::Float32, ::CallbackSet{Tuple{}, Tuple{}}, ::Nothing, ::Float32, ::Float64, ::Nothing, ::Val{false}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to init)
Stacktrace:
 [1] macro expansion
   @ C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\solve.jl:282
 [2] gpu_ode_asolve_kernel
   @ C:\Users\myzhu\.julia\packages\KernelAbstractions\QFxLx\src\macros.jl:90
 [3] gpu_ode_asolve_kernel
   @ .\none:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\validation.jl:149
  [2] macro expansion
    @ C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:415 [inlined]
  [3] macro expansion
    @ C:\Users\myzhu\.julia\packages\TimerOutputs\RsWnF\src\TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:414 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\utils.jl:89
  [6] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:129
  [7] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:106
  [8] compile
    @ C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:98 [inlined]
  [9] #1037
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\compilation.jl:104 [inlined]
 [10] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:58
 [11] compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\compilation.jl:103
 [12] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))     
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\execution.jl:125
 [13] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\execution.jl:103
 [14] macro expansion
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:318 [inlined]
 [15] macro expansion
    @ .\lock.jl:223 [inlined]
 [16] cufunction(f::typeof(DiffEqGPU.gpu_ode_asolve_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, GPUTsit5, CuDeviceMatrix{SVector{6, Float64}, 1}, CuDeviceMatrix{Float32, 1}, Float32, CallbackSet{Tuple{}, Tuple{}}, Nothing, Float32, Float64, Nothing, Val{false}}}; kwargs::Base.Pairs{Symbol, Union{Nothing, Bool}, Tuple{Symbol, Symbol}, NamedTuple{(:always_inline, :maxthreads), Tuple{Bool, Nothing}}})
    @ CUDA C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:313
 [17] macro expansion
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:104 [inlined]
 [18] (::KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_ode_asolve_kernel)})(::CuArray{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1, CUDA.Mem.DeviceBuffer}, ::Vararg{Any}; ndrange::Int64, workgroupsize::Nothing)
    @ CUDA.CUDAKernels C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\CUDAKernels.jl:117
 [19] #vectorized_asolve#166
    @ C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\solve.jl:182 [inlined]
 [20] batch_solve_up_kernel(ensembleprob::EnsembleProblem{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#9#11", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{CUDABackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:saveat, :unstable_check, :reltol, :save_everystep), Tuple{Nothing, DiffEqGPU.var"#24#30", Float64, Bool}}})
    @ DiffEqGPU C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\DiffEqGPU.jl:1019
 [21] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, 
Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 
var"#9#11", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{CUDABackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:unstable_check, :reltol, :save_everystep), Tuple{DiffEqGPU.var"#24#30", Float64, Bool}}})
    @ DiffEqGPU C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\DiffEqGPU.jl:929
 [22] macro expansion
    @ .\timing.jl:382 [inlined]
 [23] __solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#9#11", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{CUDABackend}; trajectories::Int64, batch_size::Int64, unstable_check::Function, adaptive::Bool, kwargs::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:reltol, :save_everystep), Tuple{Float64, Bool}}})
    @ DiffEqGPU C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\DiffEqGPU.jl:767
 [24] #solve#32
    @ C:\Users\myzhu\.julia\packages\DiffEqBase\tVuJP\src\solve.jl:996 [inlined]
 [25] top-level scope
    @ REPL[55]:1
 [26] top-level scope
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\initialization.jl:185

I wonder if DiffEqGPU really supports adaptive methods? Thank you for your attention and help 😋!

utkarsh530 commented 11 months ago

Can you try by providing an initial dt? The problem is that by default dt is Float32, which will cause type instability. I could fix that by converting it to the type of prob.tspan. Meanwhile, do something like this:

solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false, dt = 1e-2)

TheStarAlight commented 11 months ago

Thank you for your timely reply! However your example doesn't work :( the same error happens... But if I remove reltol from your example and it works! I wonder what happened...😂I really need the reltol because I need to provide a parameter for accuracy control. Can you help fix it? My CUDA.jl is v4.4, DiffEqGPU is v2.4. I'll be grateful if you can help☺️

utkarsh530 commented 11 months ago

It's mostly the issue of type instability. The default values for these is in Float32. Try to specify the abstol as well:

solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, abstol = 1e-3, save_everystep=false, dt = 1e-2)

utkarsh530 commented 11 months ago

Your MWE should work with v2.4.1 now.

TheStarAlight commented 11 months ago

Thank you! My problem solved, many thanks ☺️!

SciML / DiffEqGPU.jl

Does DiffEqGPU support adaptive methods? #293