RuntimeGeneratedFunctions are not compatible with CUDA.jl (ModelingToolkit.jl generated functions)

ricfrod commented 2 years ago

Hello all! This is related to this post where I was attempting to recreate the lorentz equations example in the DiffEqGPU.jl but instead of providing numerical functions I wanted to use ModelingToolkit.jl to handle the jacobian and time gradient.

Here's how to reproduce the error:

function lorenz(du,u,p,t)
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
end
u0 = Float32[1.0;0.0;0.0]
tspan = (0.0f0,100.0f0)
p = [10.0f0,28.0f0,8/3f0]
prob = ODEProblem(lorenz,u0,tspan,p)
sys = modelingtoolkitize(prob)
func = ODEFunction(sys,jac=true,tgrad=true)
prob_func = (prob,i,repeat) -> remake(prob,p=rand(Float32,3).*p)
prob_jac = ODEProblem(func,u0,tspan,p)
monteprob_jac = EnsembleProblem(prob_jac, prob_func = prob_func)
@time solve(monteprob_jac,Rodas5(),EnsembleGPUArray(),dt=0.1,trajectories=10_000,saveat=1.0f0)

ERROR: GPU compilation of kernel #gpu_gpu_kernel(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ModelingToolkit.var"#f#396"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}}, CUDA.CuDeviceMatrix{Float32, 1}, CUDA.CuDeviceMatrix{Float32, 1}, CUDA.CuDeviceMatrix{Float32, 1}, Float32) failed
KernelError: passing and using non-bitstype argument

Argument 3 to your kernel function is of type ModelingToolkit.var"#f#396"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, 
:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr}}, which is not isbits:
  .f_oop is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr} which is not isbits.
    .body is of type Expr which is not isbits.
      .head is of type Symbol which is not isbits.
      .args is of type Vector{Any} which is not isbits.
  .f_iip is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", Expr} which is not isbits.
    .body is of type Expr which is not isbits.
      .head is of type Symbol which is not isbits.
      .args is of type Vector{Any} which is not isbits.

As @ChrisRackauckas pointed out:

Seems like the way we automatically generate the functions doesn’t play nicely with CUDA.jl

ChrisRackauckas commented 2 years ago

The problem is really just that RuntimeGeneratedFunctions are not compatible with CUDA.jl compilation. @vchuravy is this possible?

vchuravy commented 2 years ago

Not in it's current state. The expr field is prohibitive. But maybe you could construct a RGF that has the right cache tag etc and doesn't store the expr.

ChrisRackauckas commented 2 years ago

Is there any way to do something like, recognize it's in this context and instead eval and invoke the eval'd one for the compilation?

vchuravy commented 2 years ago

You would have to do that before passing it off to the GPU stack.

ChrisRackauckas commented 2 years ago

I'm wondering if the Cassette stack stuff could see a RuntimeGeneratedFunction and replace it with an eval and then grab the evaluated value, naming it correctly in some namespace to know you don't have a collision?

utkarsh530 commented 1 year ago

Try EnsembleGPUKernel. It probably works with MTK-generated functions.

ChrisRackauckas commented 1 year ago

This is solved on the most recent ModelingToolkit and Symbolics ecosystems.

SciML / DiffEqGPU.jl

RuntimeGeneratedFunctions are not compatible with CUDA.jl (ModelingToolkit.jl generated functions) #169