Closed gabrevaya closed 3 years ago
The issue is that RuntimeGeneratedFunctions aren't supported in KernelAbstractions.jl. @vchuravy is there a good way to fix this?
As a workaround, instead of ODEProblem
us eval(ODEProblemExpr(sys, u0, tspan, p))
and it should be free of RGFs and compile fine.
I don't think so. If I understand RGF correctly their implementation is not compatible with GPU computing.
I mean, if we could fake purity like we do with @generated
it would be fine. We would just need a hook so that the RGF can replace itself with the code from the dictionary at the hash value (given as type information) at compile time.
Hm... We do something similar on the LLVM IR level with Enzyme, as an example see https://github.com/JuliaGPU/GPUCompiler.jl/pull/164 with delayed_codegen, but I am not sure that this will cover the full use-case of RGF.
Yeah, if we could just inline https://github.com/SciML/RuntimeGeneratedFunctions.jl/blob/master/src/RuntimeGeneratedFunctions.jl#L119-L125 we would be done.
Thanks for your quick replies! I've just tried the workaround you suggested
As a workaround, instead of
ODEProblem
useval(ODEProblemExpr(sys, u0, tspan, p))
and it should be free of RGFs and compile fine.
and now it throws this error:
ERROR: LoadError: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:322 [inlined]
[2] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:197
[3] #solve#59
@ ~/.julia/packages/DiffEqBase/jhLIm/src/solve.jl:96 [inlined]
[4] top-level scope
@ ~/test_GPU/test.jl:28
[5] include(fname::String)
@ Base.MainInclude ./client.jl:444
[6] top-level scope
@ REPL[3]:1
nested task error: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:322 [inlined]
[2] threading_run(func::Function)
@ Base.Threads ./threadingconstructs.jl:34
[3] macro expansion
@ ./threadingconstructs.jl:93 [inlined]
[4] tmap(f::Function, args::UnitRange{Int64})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:760
[5] solve_batch(prob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleThreads, II::UnitRange{Int64}, pmap_batch_size::Nothing; kwargs::Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:747
[6] f
@ ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:183 [inlined]
[7] macro expansion
@ ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:188 [inlined]
[8] (::DiffEqGPU.var"#6#12"{DiffEqGPU.var"#f#11"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}}})()
@ DiffEqGPU ./task.jl:123
nested task error: BoundsError: attempt to access 0-element UnitRange{Int64} at index [1]
Stacktrace:
[1] throw_boundserror(A::UnitRange{Int64}, I::Int64)
@ Base ./abstractarray.jl:651
[2] getindex
@ ./range.jl:696 [inlined]
[3] _broadcast_getindex_evalf
@ ./broadcast.jl:648 [inlined]
[4] _broadcast_getindex
@ ./broadcast.jl:621 [inlined]
[5] _getindex
@ ./broadcast.jl:645 [inlined]
[6] _broadcast_getindex
@ ./broadcast.jl:620 [inlined]
[7] #19
@ ./broadcast.jl:1098 [inlined]
[8] ntuple
@ ./ntuple.jl:48 [inlined]
[9] copy
@ ./broadcast.jl:1098 [inlined]
[10] materialize
@ ./broadcast.jl:883 [inlined]
[11] responsible_map(f::Function, II::UnitRange{Int64})
@ SciMLBase ~/.julia/packages/SciMLBase/9EjAY/src/ensemble/basic_ensemble_solve.jl:186
[12] #solve_batch#456
@ ~/.julia/packages/SciMLBase/9EjAY/src/ensemble/basic_ensemble_solve.jl:194 [inlined]
[13] (::DiffEqGPU.var"#93#95"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}, Nothing, Int64})(i::Int64)
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:753
[14] macro expansion
@ ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:761 [inlined]
[15] (::DiffEqGPU.var"#237#threadsfor_fun#96"{DiffEqGPU.var"#93#95"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}, Nothing, Int64}, Tuple{UnitRange{Int64}}, Vector{Vector{ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}}}, UnitRange{Int64}})(onethread::Bool)
@ DiffEqGPU ./threadingconstructs.jl:81
[16] (::DiffEqGPU.var"#237#threadsfor_fun#96"{DiffEqGPU.var"#93#95"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}, Nothing, Int64}, Tuple{UnitRange{Int64}}, Vector{Vector{ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}}}, UnitRange{Int64}})()
@ DiffEqGPU ./threadingconstructs.jl:48
in expression starting at ~/test_GPU/test.jl:28
Paste an MWE generated by ODEProblemExpr(sys, u0, tspan, p)
The expression generated with this MWE
using ModelingToolkit, OrdinaryDiffEq
@parameters t β
@variables x(t)
D = Differential(t)
eqs = [D(x) ~ β*x]
sys = ODESystem(eqs)
u0 = [x => 0.5f0]
p = [β => 1.1f0]
tspan = (0.0f0,1.0f0)
ODEProblemExpr(sys, u0, tspan, p)
is the following
quote
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:548 =#
f = begin
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:369 =#
var"##f#261" = ModelingToolkit.ODEFunctionClosure(function (var"##arg#258", var"##arg#259", t)
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:282 =#
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:283 =#
let var"x(t)" = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#258"[1]), β = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#259"[1])
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:375 =#
(SymbolicUtils.Code.create_array)(typeof(var"##arg#258"), nothing, Val{1}(), Val{(1,)}(), (*)(β, var"x(t)"))
end
end, function (var"##out#260", var"##arg#258", var"##arg#259", t)
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:282 =#
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:283 =#
let var"x(t)" = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#258"[1]), β = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#259"[1])
#= /home/.julia/packages/Symbolics/jdBV3/src/build_function.jl:331 =#
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:329 =# @inbounds begin
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:325 =#
var"##out#260"[1] = (*)(β, var"x(t)")
#= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:327 =#
nothing
end
end
end)
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:370 =#
var"##tgrad#262" = nothing
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:371 =#
var"##jac#263" = nothing
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:372 =#
M = LinearAlgebra.UniformScaling{Bool}(true)
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:373 =#
ODEFunction{true}(var"##f#261", jac = var"##jac#263", tgrad = var"##tgrad#262", mass_matrix = M, jac_prototype = nothing, syms = [Symbol("x(t)")], indepsym = :t)
end
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:549 =#
u0 = Float32[0.5]
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:550 =#
tspan = (0.0f0, 1.0f0)
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:551 =#
p = Float32[1.1]
#= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:552 =#
ODEProblem(f, u0, tspan, p; )
end
using ModelingToolkit
using OrdinaryDiffEq
using DiffEqGPU
@parameters t σ ρ β
@variables x(t) y(t) z(t)
D = Differential(t)
eqs = [D(x) ~ σ*(y-x),
D(y) ~ x*(ρ-z)-y,
D(z) ~ x*y - β*z]
sys = ODESystem(eqs)
u0 = [x => 1.0f0
y => 0.0f0
z => 0.0f0]
p = [σ => 10.0f0
ρ => 28.0f0
β => 8f0/3f0]
tspan = (0.0f0, 100.0f0)
prob = eval(ODEProblemExpr(sys, u0, tspan, p))
prob_func = (prob, i, repeat) -> remake(prob, p = rand(Float32, 3).*prob.p)
monteprob = EnsembleProblem(prob, prob_func = prob_func)
sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10, saveat = 1.0f0)
seems to work fine for me.
I have just tried it again and resulted in the same error as last time. Then I noticed that the error is mentioning threadingconstructs.jl
so I tried setting the threads to 1 and that solved the problem! So it seems that the issue appears when using EnsembleGPUArray()
with Threads.nthreads() > 1
.
My original application was actually generating the function at runtime so a solution to the general problem would be helpful eventually but I think I can adapt my codes to use this workaround for now.
Thanks a lot Chris!! :)
Yeah there's something weird with CUDA.jl 3.1. It seems like it broke the interactions with Tasks. I let @maleadt know in https://github.com/SciML/DiffEqGPU.jl/pull/103 but haven't been able to narrow it down.
I don't see the error you reported there (ERROR_LAUNCH_FAILED
) listed here? Anyway, CUDA.jl 3.1 now uses task local storage to store the active stream and library handles on, so you need to synchronize when switching tasks (or re-apply the active device with multi-GPU computing), in case that would help to diagnose any issue here.
Looks like I isolated it to CUDAKernels 0.2. So The new tag should work @gabrevaya and there's an upper bound still.
Thanks a lot @ChrisRackauckas! However now DiffEqGPU is not compatible with CUDA 3.1 and I think it's because CUDAKernels 0.1 is only compatible with CUDA < 3.
Okay, so CUDA.jl 3 or CUDAKernels 0.2 are still the issue. Hmm... continue this to another issue? I'll need @vchuravy and @maleadt 's help because I don't know which of the two libraries it is, but it's something is messing with the task handling.
Hi! The same MWE from #56 is not working anymore: