SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
272 stars 27 forks source link

Does DiffEqGPU support adaptive methods? #293

Closed TheStarAlight closed 11 months ago

TheStarAlight commented 11 months ago

Hi, I'm using DiffEqGPU for my julia package to perform ensemble simulations (using EnsembleGPUKernel and GPUTsit5), now the non-adaptive methods (adaptive=false, dt=...) works well.

However, when I try to use adaptive methods (adaptive=true,reltol=...) the error occured...

I'm using the code from here.

And here I enclose the code below which can run in REPL:

using StaticArrays
using OrdinaryDiffEq
using DiffEqGPU, CUDA

# l = Lasers.Cos2Laser(2e14,800,2,1.0)
traj = function (u,p,t)
    F0 = 0.05338027007325633 # Lasers.LaserF0(l)
    ω = 0.05695419065625 # Lasers.AngFreq(l)
    N = 2   # l.cycNum
    φ = 0.0 # l.cep
    ε = 1.0 # l.ellip
    tFx, tFy, tFz = -1*(u[1]^2+u[2]^2+u[3]^2+1.0)^(-1.5) .* (u[1],u[2],u[3])
    du1 = u[4]
    du2 = u[5]
    du3 = u[6]
    du4 = tFx- F0 * cos(ω*t/(2N)) * (abs(ω*real(t))<N*π) * ( cos(ω*t/(2N))*sin(ω*t+φ) + 1/N*sin(ω*t/(2N))*cos(ω*t+φ))
    du5 = tFy- F0 * cos(ω*t/(2N)) * (abs(ω*real(t))<N*π) * ( cos(ω*t/(2N))*cos(ω*t+φ) - 1/N*sin(ω*t/(2N))*sin(ω*t+φ)) * ε
    du6 = tFz
    @SVector [du1,du2,du3,du4,du5,du6]
end
traj_t_final = 120.
traj_dt = 0.001
init = [10.0  9.0  8.0  7.0;    # x
        5.0  1.0  0.0  0.0;    # y
        0.0  0.0  6.0  1.0;    # z
        0.0  1.0  0.0  1.0;    # px
        1.0  1.0 -1.0  2.0;    # py
        0.0  0.0  0.0  3.0;    # pz
        0.0 -5.0  0.0  0.0;]
traj_ODE_prob::ODEProblem = ODEProblem(traj, (@SVector zeros(Float64,6)), (0,traj_t_final))
init_traj = (prob,i,repeat) -> remake(prob; u0=SVector{6}([init[k,i] for k in 1:6]), tspan=(init[7,i],traj_t_final))
ensemble_prob::EnsembleProblem = EnsembleProblem(traj_ODE_prob, prob_func=init_traj, safetycopy=false)
solc = nothing
solg = nothing
# CPU non-adaptive
solc = solve(ensemble_prob, OrdinaryDiffEq.Tsit5(), EnsembleThreads(), trajectories=size(init,2), adaptive=false, dt=traj_dt, save_everystep=false)
# GPU non-adaptive
solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=false, dt=traj_dt, save_everystep=false)
# CPU adaptive
solc = solve(ensemble_prob, OrdinaryDiffEq.Tsit5(), EnsembleThreads(), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false)
# GPU adaptive
solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false)

when running the last line the error happens :( Here's the stack trace:

julia> solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false)
ERROR: InvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_ode_asolve_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::CuDeviceVector{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, ::GPUTsit5, ::CuDeviceMatrix{SVector{6, Float64}, 1}, ::CuDeviceMatrix{Float32, 1}, ::Float32, ::CallbackSet{Tuple{}, Tuple{}}, ::Nothing, ::Float32, ::Float64, ::Nothing, ::Val{false}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to init)
Stacktrace:
 [1] macro expansion
   @ C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\solve.jl:282
 [2] gpu_ode_asolve_kernel
   @ C:\Users\myzhu\.julia\packages\KernelAbstractions\QFxLx\src\macros.jl:90
 [3] gpu_ode_asolve_kernel
   @ .\none:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\validation.jl:149
  [2] macro expansion
    @ C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:415 [inlined]
  [3] macro expansion
    @ C:\Users\myzhu\.julia\packages\TimerOutputs\RsWnF\src\TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:414 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\utils.jl:89
  [6] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:129
  [7] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:106
  [8] compile
    @ C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:98 [inlined]
  [9] #1037
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\compilation.jl:104 [inlined]
 [10] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\driver.jl:58
 [11] compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\compilation.jl:103
 [12] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))     
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\execution.jl:125
 [13] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler C:\Users\myzhu\.julia\packages\GPUCompiler\YO8Uj\src\execution.jl:103
 [14] macro expansion
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:318 [inlined]
 [15] macro expansion
    @ .\lock.jl:223 [inlined]
 [16] cufunction(f::typeof(DiffEqGPU.gpu_ode_asolve_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, CuDeviceVector{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, GPUTsit5, CuDeviceMatrix{SVector{6, Float64}, 1}, CuDeviceMatrix{Float32, 1}, Float32, CallbackSet{Tuple{}, Tuple{}}, Nothing, Float32, Float64, Nothing, Val{false}}}; kwargs::Base.Pairs{Symbol, Union{Nothing, Bool}, Tuple{Symbol, Symbol}, NamedTuple{(:always_inline, :maxthreads), Tuple{Bool, Nothing}}})
    @ CUDA C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:313
 [17] macro expansion
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:104 [inlined]
 [18] (::KernelAbstractions.Kernel{CUDABackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_ode_asolve_kernel)})(::CuArray{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1, CUDA.Mem.DeviceBuffer}, ::Vararg{Any}; ndrange::Int64, workgroupsize::Nothing)
    @ CUDA.CUDAKernels C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\CUDAKernels.jl:117
 [19] #vectorized_asolve#166
    @ C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\solve.jl:182 [inlined]
 [20] batch_solve_up_kernel(ensembleprob::EnsembleProblem{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#9#11", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{CUDABackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:saveat, :unstable_check, :reltol, :save_everystep), Tuple{Nothing, DiffEqGPU.var"#24#30", Float64, Bool}}})
    @ DiffEqGPU C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\DiffEqGPU.jl:1019
 [21] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, 
Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 
var"#9#11", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{CUDABackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:unstable_check, :reltol, :save_everystep), Tuple{DiffEqGPU.var"#24#30", Float64, Bool}}})
    @ DiffEqGPU C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\DiffEqGPU.jl:929
 [22] macro expansion
    @ .\timing.jl:382 [inlined]
 [23] __solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{6, Float64}, Tuple{Float64, Float64}, false, SciMLBase.NullParameters, ODEFunction{false, SciMLBase.AutoSpecialize, var"#7#8", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#9#11", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{CUDABackend}; trajectories::Int64, batch_size::Int64, unstable_check::Function, adaptive::Bool, kwargs::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:reltol, :save_everystep), Tuple{Float64, Bool}}})
    @ DiffEqGPU C:\Users\myzhu\.julia\packages\DiffEqGPU\fFdGb\src\DiffEqGPU.jl:767
 [24] #solve#32
    @ C:\Users\myzhu\.julia\packages\DiffEqBase\tVuJP\src\solve.jl:996 [inlined]
 [25] top-level scope
    @ REPL[55]:1
 [26] top-level scope
    @ C:\Users\myzhu\.julia\packages\CUDA\tVtYo\src\initialization.jl:185

I wonder if DiffEqGPU really supports adaptive methods? Thank you for your attention and help 😋!

utkarsh530 commented 11 months ago

Can you try by providing an initial dt? The problem is that by default dt is Float32, which will cause type instability. I could fix that by converting it to the type of prob.tspan. Meanwhile, do something like this:

solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, save_everystep=false, dt = 1e-2)
TheStarAlight commented 11 months ago

Thank you for your timely reply! However your example doesn't work :( the same error happens... But if I remove reltol from your example and it works! I wonder what happened...😂I really need the reltol because I need to provide a parameter for accuracy control. Can you help fix it? My CUDA.jl is v4.4, DiffEqGPU is v2.4. I'll be grateful if you can help☺️

utkarsh530 commented 11 months ago

It's mostly the issue of type instability. The default values for these is in Float32. Try to specify the abstol as well:

solg = solve(ensemble_prob, DiffEqGPU.GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend(),0.0), trajectories=size(init,2), adaptive=true, reltol=1e-6, abstol = 1e-3, save_everystep=false, dt = 1e-2)
utkarsh530 commented 11 months ago

Your MWE should work with v2.4.1 now.

TheStarAlight commented 11 months ago

Thank you! My problem solved, many thanks ☺️!