nvalidIRError while solving large (32 equations) ODEs

Hello, I have a question regarding an error that I am facing when solving systems of ODEs with more than 32 equations:

you can take a look at this other github issue especially to the last example, that I am posting here again for completeness:

using Plots, DifferentialEquations, CUDA, StaticArrays, DiffEqGPU, Symbolics, Catalyst

# works below 33, fails above
n_x = 33
@variables t
D = Differential(t)
@variables X(t)[1:n_x]

eqs = []
for i in 1:n_x
    push!(eqs, D(X[i]) ~ Num(1.0))
end

p = @SVector [1.0 for i in 1:n_x]
u0 = @SVector [1.0 for i in 1:n_x]
tspan = (0.0, 10.0)
sys = @named test = ODESystem(eqs, t)
prob = ODEProblem(sys, u0, tspan, p) # the problem must be out-of-place

# solve on CPU
sol = solve(prob, Tsit5(), saveat=0.1)
plot(sol, idxs=1:n_x)

# prepare the problem for the GPU
problem_func = (problem, i, repeat) -> remake(problem, u0=rand(n_x)) # randomize the initial condition
multi_problem = EnsembleProblem(prob; prob_func = problem_func, safetycopy = false)

# solve on GPU
solutions = solve(multi_problem, GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend()), trajectories=10, adaptive = true, reltol=1e-6, abstol=1e-6, saveat=0.1)

# plot the results
for s in solutions
    plot!(s, idxs=1:n_x, linestyle=:dash, legend=false)
end
plot!()

This code fails with the following error:

nvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_ode_asolve_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::CuDeviceVector{DiffEqGPU.ImmutableODEProblem{SVector{33, Float64}, Tuple{Float64, Float64}, true, SVector{33, Float64}, ODEFunction{true, SciMLBase.AutoSpecialize, ModelingToolkit.var"#k#503"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xae84faba, 0xca798e0c, 0x0f4ab0aa, 0xacc7e0b8, 0x5dfc199c), Nothing}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x150edf5a, 0xa4ae4349, 0xa9fcb0e2, 0x7d639c2e, 0x036b7279), Nothing}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, ModelingToolkit.var"#1269#generated_observed#513"{Bool, ODESystem, Dict{Any, Any}, Vector{SymbolicUtils.BasicSymbolic{Real}}}, Nothing, ODESystem, Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardODEProblem}, 1}, ::GPUTsit5, ::CuDeviceMatrix{SVector{33, Float64}, 1}, ::CuDeviceMatrix{Float64, 1}, ::Float64, ::CallbackSet{Tuple{}, Tuple{}}, ::Nothing, ::Float64, ::Float64, ::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, ::Val{false}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Stacktrace:
 [1] multiple call sites
   @ unknown:0
Reason: unsupported call to an unknown function (call to julia.push_gc_frame)
Stacktrace:
 [1] multiple call sites
   @ unknown:0
Reason: unsupported call to an unknown function (call to julia.get_gc_frame_slot)
Stacktrace:
 [1] multiple call sites
   @ unknown:0
Reason: unsupported call to an unknown function (call to jl_f__apply_iterate)

Do you have an idea on why this is happening? I am using a Titan RTX on my machine. Let me know if you need additional information.

Thank you in advance.

Updated example for MTK v9:

using Plots, DifferentialEquations, CUDA, StaticArrays, DiffEqGPU, ModelingToolkit

# works below 33, fails above
n_x = 33
@variables t
D = Differential(t)
@variables X(t)[1:n_x]

eqs = []
for i in 1:n_x
    push!(eqs, D(X[i]) ~ Num(1.0))
end

p = @SVector [1.0 for i in 1:n_x]
u0 = @SVector [1.0 for i in 1:n_x]
tspan = (0.0, 10.0)
@mtkbuild sys = ODESystem(eqs, t)
prob = ODEProblem(sys, u0, tspan) # the problem must be out-of-place

# solve on CPU
sol = solve(prob, Tsit5(), saveat=0.1)
plot(sol, idxs=1:n_x)

# prepare the problem for the GPU
problem_func = (problem, i, repeat) -> remake(problem) # randomize the initial condition
multi_problem = EnsembleProblem(prob; prob_func = problem_func, safetycopy = false)

# solve on GPU
solutions = solve(multi_problem, GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend()), trajectories=10, adaptive = true, reltol=1e-6, abstol=1e-6, saveat=0.1)

# plot the results
for s in solutions
    plot!(s, idxs=1:n_x, linestyle=:dash, legend=false)
end
plot!()

The issue is the StaticArrays splat limit I think? @vchuravy is there a way around this?

SciML / DiffEqGPU.jl

nvalidIRError while solving large (32 equations) ODEs #326