SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
283 stars 29 forks source link

EnsembleGPUArray failing when using ModelingToolkit #102

Closed gabrevaya closed 3 years ago

gabrevaya commented 3 years ago

Hi! The same MWE from #56 is not working anymore:

using ModelingToolkit
using OrdinaryDiffEq
using DiffEqGPU
using KernelAbstractions

@parameters t σ ρ β
@variables x(t) y(t) z(t)
D = Differential(t)

eqs = [D(x) ~ σ*(y-x),
       D(y) ~ x*(ρ-z)-y,
       D(z) ~ x*y - β*z]

sys = ODESystem(eqs)

u0 = [x => 1.0f0
      y => 0.0f0
      z => 0.0f0]

p  = [σ => 10.0f0
      ρ => 28.0f0
      β => 8f0/3f0]

tspan = (0.0f0, 100.0f0)
prob = ODEProblem(sys, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = rand(Float32, 3).*prob.p)
monteprob = EnsembleProblem(prob, prob_func = prob_func)
sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10, saveat = 1.0f0)
ERROR: LoadError: GPU compilation of kernel gpu_gpu_kernel(Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}}, CUDA.CuDeviceMatrix{Float32, 1}, CUDA.CuDeviceMatrix{Float32, 1}, CUDA.CuDeviceMatrix{Float32, 1}, Float32) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}}, which is not isbits:
  .f_oop is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr} which is not isbits.
    .body is of type Expr which is not isbits.
      .head is of type Symbol which is not isbits.
      .args is of type Vector{Any} which is not isbits.
  .f_iip is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr} which is not isbits.
    .body is of type Expr which is not isbits.
      .head is of type Symbol which is not isbits.
      .args is of type Vector{Any} which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob, entry::LLVM.Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/validation.jl:68
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/XwWPj/src/driver.jl:287 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/XwWPj/src/driver.jl:286 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, kernel::LLVM.Function; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:306
  [7] check_cache
    @ ~/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:44 [inlined]
  [8] cached_compilation
    @ ./none:0 [inlined]
  [9] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}}, CUDA.CuDeviceMatrix{Float32, 1}, CUDA.CuDeviceMatrix{Float32, 1}, CUDA.CuDeviceMatrix{Float32, 1}, Float32}}; name::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:294
 [10] macro expansion
    @ ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:102 [inlined]
 [11] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel)})(::Function, ::Vararg{Any, N} where N; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
    @ CUDAKernels ~/.julia/packages/CUDAKernels/94MY8/src/CUDAKernels.jl:192
 [12] (::DiffEqGPU.var"#55#59"{ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, typeof(DiffEqGPU.gpu_kernel)})(du::CUDA.CuArray{Float32, 2}, u::CUDA.CuArray{Float32, 2}, p::CUDA.CuArray{Float32, 2}, t::Float32)
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:408
 [13] ODEFunction
    @ ~/.julia/packages/SciMLBase/9EjAY/src/scimlfunctions.jl:334 [inlined]
 [14] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5, true, CUDA.CuArray{Float32, 2}, Nothing, Float32, CUDA.CuArray{Float32, 2}, Float32, Float32, Float32, Vector{CUDA.CuArray{Float32, 2}}, ODESolution{Float32, 3, Vector{CUDA.CuArray{Float32, 2}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{CUDA.CuArray{Float32, 2}}}, ODEProblem{CUDA.CuArray{Float32, 2}, Tuple{Float32, Float32}, true, CUDA.CuArray{Float32, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CUDA.CuArray{Float32, 2}}, Vector{Float32}, Vector{Vector{CUDA.CuArray{Float32, 2}}}, OrdinaryDiffEq.Tsit5Cache{CUDA.CuArray{Float32, 2}, CUDA.CuArray{Float32, 2}, CUDA.CuArray{Float32, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CUDA.CuArray{Float32, 2}, CUDA.CuArray{Float32, 2}, CUDA.CuArray{Float32, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}, OrdinaryDiffEq.DEOptions{Float32, Float32, Float32, Float32, typeof(DiffEqGPU.diffeqgpunorm), typeof(LinearAlgebra.opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#10#16", DataStructures.BinaryMinHeap{Float32}, DataStructures.BinaryMinHeap{Float32}, Nothing, Nothing, Int64, Tuple{}, Float32, Tuple{}}, CUDA.CuArray{Float32, 2}, Float32, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CUDA.CuArray{Float32, 2}, CUDA.CuArray{Float32, 2}, CUDA.CuArray{Float32, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/2Z4fE/src/perform_step/low_order_rk_perform_step.jl:623
 [15] __init(prob::ODEProblem{CUDA.CuArray{Float32, 2}, Tuple{Float32, Float32}, true, CUDA.CuArray{Float32, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Float32, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float32, dtmin::Nothing, dtmax::Float32, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, qoldinit::Rational{Int64}, fullnormalize::Bool, failfactor::Int64, beta1::Nothing, beta2::Nothing, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#10#16", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/2Z4fE/src/solve.jl:433
 [16] #__solve#403
    @ ~/.julia/packages/OrdinaryDiffEq/2Z4fE/src/solve.jl:4 [inlined]
 [17] #solve_call#56
    @ ~/.julia/packages/DiffEqBase/jhLIm/src/solve.jl:61 [inlined]
 [18] solve_up(prob::ODEProblem{CUDA.CuArray{Float32, 2}, Tuple{Float32, Float32}, true, CUDA.CuArray{Float32, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CUDA.CuArray{Float32, 2}, p::CUDA.CuArray{Float32, 2}, args::Tsit5; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:unstable_check, :saveat, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#10#16", Float32, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/jhLIm/src/solve.jl:82
 [19] #solve#57
    @ ~/.julia/packages/DiffEqBase/jhLIm/src/solve.jl:70 [inlined]
 [20] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, ModelingToolkit.var"#121#generated_observed#155"{Bool, ODESystem, Dict{Any, Any}}, Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, ModelingToolkit.var"#121#generated_observed#155"{Bool, ODESystem, Dict{Any, Any}}, Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:unstable_check, :saveat), Tuple{DiffEqGPU.var"#10#16", Float32}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:313
 [21] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, ModelingToolkit.var"#121#generated_observed#155"{Bool, ODESystem, Dict{Any, Any}}, Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:unstable_check, :saveat), Tuple{DiffEqGPU.var"#10#16", Float32}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:278
 [22] macro expansion
    @ ./timing.jl:287 [inlined]
 [23] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.var"#f#148"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xe09e3e39, 0x901fd863, 0xaabe4072, 0xa349f5db, 0x1e2ac5dd)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(Symbol("##out#353"), Symbol("##arg#351"), Symbol("##arg#352"), :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x255a3166, 0x06ba9a71, 0xa2d333f0, 0xa11339db, 0xc80f8c0d)}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, ModelingToolkit.var"#121#generated_observed#155"{Bool, ODESystem, Dict{Any, Any}}, Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:195
 [24] #solve#59
    @ ~/.julia/packages/DiffEqBase/jhLIm/src/solve.jl:96 [inlined]
 [25] top-level scope
    @ ~/test_GPU/test.jl:27
 [26] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [27] top-level scope
    @ REPL[2]:1
in expression starting at ~/test_GPU/test.jl:27
(test_GPU) pkg> st
      Status `~/test_GPU/Project.toml`
  [052768ef] CUDA v2.6.3
  [071ae1c0] DiffEqGPU v1.11.0
  [63c18a36] KernelAbstractions v0.6.1
  [961ee093] ModelingToolkit v5.16.0
  [1dea7af3] OrdinaryDiffEq v5.52.6

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
Environment:
  JULIA_NUM_THREADS = 20
ChrisRackauckas commented 3 years ago

The issue is that RuntimeGeneratedFunctions aren't supported in KernelAbstractions.jl. @vchuravy is there a good way to fix this?

As a workaround, instead of ODEProblem us eval(ODEProblemExpr(sys, u0, tspan, p)) and it should be free of RGFs and compile fine.

vchuravy commented 3 years ago

I don't think so. If I understand RGF correctly their implementation is not compatible with GPU computing.

ChrisRackauckas commented 3 years ago

I mean, if we could fake purity like we do with @generated it would be fine. We would just need a hook so that the RGF can replace itself with the code from the dictionary at the hash value (given as type information) at compile time.

vchuravy commented 3 years ago

Hm... We do something similar on the LLVM IR level with Enzyme, as an example see https://github.com/JuliaGPU/GPUCompiler.jl/pull/164 with delayed_codegen, but I am not sure that this will cover the full use-case of RGF.

ChrisRackauckas commented 3 years ago

Yeah, if we could just inline https://github.com/SciML/RuntimeGeneratedFunctions.jl/blob/master/src/RuntimeGeneratedFunctions.jl#L119-L125 we would be done.

gabrevaya commented 3 years ago

Thanks for your quick replies! I've just tried the workaround you suggested

As a workaround, instead of ODEProblem us eval(ODEProblemExpr(sys, u0, tspan, p)) and it should be free of RGFs and compile fine.

and now it throws this error:

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}})
   @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:197
 [3] #solve#59
   @ ~/.julia/packages/DiffEqBase/jhLIm/src/solve.jl:96 [inlined]
 [4] top-level scope
   @ ~/test_GPU/test.jl:28
 [5] include(fname::String)
   @ Base.MainInclude ./client.jl:444
 [6] top-level scope
   @ REPL[3]:1

    nested task error: TaskFailedException
    Stacktrace:
     [1] wait
       @ ./task.jl:322 [inlined]
     [2] threading_run(func::Function)
       @ Base.Threads ./threadingconstructs.jl:34
     [3] macro expansion
       @ ./threadingconstructs.jl:93 [inlined]
     [4] tmap(f::Function, args::UnitRange{Int64})
       @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:760
     [5] solve_batch(prob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleThreads, II::UnitRange{Int64}, pmap_batch_size::Nothing; kwargs::Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}})
       @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:747
     [6] f
       @ ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:183 [inlined]
     [7] macro expansion
       @ ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:188 [inlined]
     [8] (::DiffEqGPU.var"#6#12"{DiffEqGPU.var"#f#11"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}}})()
       @ DiffEqGPU ./task.jl:123

        nested task error: BoundsError: attempt to access 0-element UnitRange{Int64} at index [1]
        Stacktrace:
          [1] throw_boundserror(A::UnitRange{Int64}, I::Int64)
            @ Base ./abstractarray.jl:651
          [2] getindex
            @ ./range.jl:696 [inlined]
          [3] _broadcast_getindex_evalf
            @ ./broadcast.jl:648 [inlined]
          [4] _broadcast_getindex
            @ ./broadcast.jl:621 [inlined]
          [5] _getindex
            @ ./broadcast.jl:645 [inlined]
          [6] _broadcast_getindex
            @ ./broadcast.jl:620 [inlined]
          [7] #19
            @ ./broadcast.jl:1098 [inlined]
          [8] ntuple
            @ ./ntuple.jl:48 [inlined]
          [9] copy
            @ ./broadcast.jl:1098 [inlined]
         [10] materialize
            @ ./broadcast.jl:883 [inlined]
         [11] responsible_map(f::Function, II::UnitRange{Int64})
            @ SciMLBase ~/.julia/packages/SciMLBase/9EjAY/src/ensemble/basic_ensemble_solve.jl:186
         [12] #solve_batch#456
            @ ~/.julia/packages/SciMLBase/9EjAY/src/ensemble/basic_ensemble_solve.jl:194 [inlined]
         [13] (::DiffEqGPU.var"#93#95"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}, Nothing, Int64})(i::Int64)
            @ DiffEqGPU ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:753
         [14] macro expansion
            @ ~/.julia/packages/DiffEqGPU/YMmTv/src/DiffEqGPU.jl:761 [inlined]
         [15] (::DiffEqGPU.var"#237#threadsfor_fun#96"{DiffEqGPU.var"#93#95"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}, Nothing, Int64}, Tuple{UnitRange{Int64}}, Vector{Vector{ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}}}, UnitRange{Int64}})(onethread::Bool)
            @ DiffEqGPU ./threadingconstructs.jl:81
         [16] (::DiffEqGPU.var"#237#threadsfor_fun#96"{DiffEqGPU.var"#93#95"{Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}}, EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#5#6", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, Tsit5, UnitRange{Int64}, Nothing, Int64}, Tuple{UnitRange{Int64}}, Vector{Vector{ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, ModelingToolkit.ODEFunctionClosure{var"#1#3", var"#2#4"}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vector{Symbol}, Symbol, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}}}, UnitRange{Int64}})()
            @ DiffEqGPU ./threadingconstructs.jl:48
in expression starting at ~/test_GPU/test.jl:28
ChrisRackauckas commented 3 years ago

Paste an MWE generated by ODEProblemExpr(sys, u0, tspan, p)

gabrevaya commented 3 years ago

The expression generated with this MWE

using ModelingToolkit, OrdinaryDiffEq

@parameters t β
@variables x(t)
D = Differential(t)
eqs = [D(x) ~ β*x]
sys = ODESystem(eqs)
u0 = [x => 0.5f0]
p  = [β => 1.1f0]
tspan = (0.0f0,1.0f0)
ODEProblemExpr(sys, u0, tspan, p)

is the following

quote
    #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:548 =#
    f = begin
            #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:369 =#
            var"##f#261" = ModelingToolkit.ODEFunctionClosure(function (var"##arg#258", var"##arg#259", t)
                        #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:282 =#
                        #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:283 =#
                        let var"x(t)" = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#258"[1]), β = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#259"[1])
                            #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:375 =#
                            (SymbolicUtils.Code.create_array)(typeof(var"##arg#258"), nothing, Val{1}(), Val{(1,)}(), (*)(β, var"x(t)"))
                        end
                    end, function (var"##out#260", var"##arg#258", var"##arg#259", t)
                        #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:282 =#
                        #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:283 =#
                        let var"x(t)" = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#258"[1]), β = #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:169 =# @inbounds(var"##arg#259"[1])
                            #= /home/.julia/packages/Symbolics/jdBV3/src/build_function.jl:331 =#
                            #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:329 =# @inbounds begin
                                    #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:325 =#
                                    var"##out#260"[1] = (*)(β, var"x(t)")
                                    #= /home/.julia/packages/SymbolicUtils/9iQGH/src/code.jl:327 =#
                                    nothing
                                end
                        end
                    end)
            #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:370 =#
            var"##tgrad#262" = nothing
            #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:371 =#
            var"##jac#263" = nothing
            #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:372 =#
            M = LinearAlgebra.UniformScaling{Bool}(true)
            #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:373 =#
            ODEFunction{true}(var"##f#261", jac = var"##jac#263", tgrad = var"##tgrad#262", mass_matrix = M, jac_prototype = nothing, syms = [Symbol("x(t)")], indepsym = :t)
        end
    #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:549 =#
    u0 = Float32[0.5]
    #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:550 =#
    tspan = (0.0f0, 1.0f0)
    #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:551 =#
    p = Float32[1.1]
    #= /home/.julia/packages/ModelingToolkit/Mo4gw/src/systems/diffeqs/abstractodesystem.jl:552 =#
    ODEProblem(f, u0, tspan, p; )
end
ChrisRackauckas commented 3 years ago
using ModelingToolkit
using OrdinaryDiffEq
using DiffEqGPU
@parameters t σ ρ β
@variables x(t) y(t) z(t)
D = Differential(t)

eqs = [D(x) ~ σ*(y-x),
       D(y) ~ x*(ρ-z)-y,
       D(z) ~ x*y - β*z]

sys = ODESystem(eqs)

u0 = [x => 1.0f0
      y => 0.0f0
      z => 0.0f0]

p  = [σ => 10.0f0
      ρ => 28.0f0
      β => 8f0/3f0]

tspan = (0.0f0, 100.0f0)
prob = eval(ODEProblemExpr(sys, u0, tspan, p))
prob_func = (prob, i, repeat) -> remake(prob, p = rand(Float32, 3).*prob.p)
monteprob = EnsembleProblem(prob, prob_func = prob_func)
sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10, saveat = 1.0f0)

seems to work fine for me.

gabrevaya commented 3 years ago

I have just tried it again and resulted in the same error as last time. Then I noticed that the error is mentioning threadingconstructs.jl so I tried setting the threads to 1 and that solved the problem! So it seems that the issue appears when using EnsembleGPUArray() with Threads.nthreads() > 1.

My original application was actually generating the function at runtime so a solution to the general problem would be helpful eventually but I think I can adapt my codes to use this workaround for now.

Thanks a lot Chris!! :)

ChrisRackauckas commented 3 years ago

Yeah there's something weird with CUDA.jl 3.1. It seems like it broke the interactions with Tasks. I let @maleadt know in https://github.com/SciML/DiffEqGPU.jl/pull/103 but haven't been able to narrow it down.

maleadt commented 3 years ago

I don't see the error you reported there (ERROR_LAUNCH_FAILED) listed here? Anyway, CUDA.jl 3.1 now uses task local storage to store the active stream and library handles on, so you need to synchronize when switching tasks (or re-apply the active device with multi-GPU computing), in case that would help to diagnose any issue here.

ChrisRackauckas commented 3 years ago

Looks like I isolated it to CUDAKernels 0.2. So The new tag should work @gabrevaya and there's an upper bound still.

gabrevaya commented 3 years ago

Thanks a lot @ChrisRackauckas! However now DiffEqGPU is not compatible with CUDA 3.1 and I think it's because CUDAKernels 0.1 is only compatible with CUDA < 3.

ChrisRackauckas commented 3 years ago

Okay, so CUDA.jl 3 or CUDAKernels 0.2 are still the issue. Hmm... continue this to another issue? I'll need @vchuravy and @maleadt 's help because I don't know which of the two libraries it is, but it's something is messing with the task handling.