SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
279 stars 29 forks source link

LoadError: KernelException on README example #144

Closed lazarusA closed 2 years ago

lazarusA commented 2 years ago

Hi, if I run the following:

using OrdinaryDiffEq, CUDA, LinearAlgebra
using DiffEqGPU
function lorenz(du, u, p, t)
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end

u0 = Float32[1.0; 0.0; 0.0]
tspan = (0.0f0, 100.0f0)
p = [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = rand(Float32, 3) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10, saveat = 1.0f0)

I get the following error....

ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces. ERROR: LoadError: KernelException: exception thrown during kernel execution on device Tesla V100-DGXS-16GB Stacktrace:

this is my env: (DiffEqGPU and DiffEqGPU#master show the same error).

~/JuliaConCUDA/Project.toml`
  [621f4979] AbstractFFTs v1.1.0
  [6e4b80f9] BenchmarkTools v1.2.2
  [052768ef] CUDA v3.8.0
  [72cfdca4] CUDAKernels v0.2.1
  [3da002f7] ColorTypes v0.11.0
  [5ae59095] Colors v0.12.8
  [071ae1c0] DiffEqGPU v1.15.0 `https://github.com/SciML/DiffEqGPU.jl.git#master`
  [5789e2e9] FileIO v1.13.0
  [53c48c17] FixedPointNumbers v0.8.4
  [f332f351] ImageContrastAdjustment v0.3.10
  [a09fc81d] ImageCore v0.9.3
  [6a3955dd] ImageFiltering v0.7.1
  [6218d12a] ImageMagick v1.2.2
  [4e3cecfd] ImageShow v0.3.3
  [63c18a36] KernelAbstractions v0.6.3
  [1dea7af3] OrdinaryDiffEq v6.6.6
  [62fd8b95] TensorCore v0.1.1
  [5e47fb64] TestImages v1.6.2
  [bc48ee85] Tullio v0.3.3
  [37e2e46d] LinearAlgebra
ChrisRackauckas commented 2 years ago

Try:

function lorenz(du,u,p,t)
    @inbounds begin
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
    end
end
lazarusA commented 2 years ago

same output. It would be nice to have Project file under which we know that things work. Probably, I do have the wrong combination of dependencies. Please see the complete new example again, with a clean env.

using OrdinaryDiffEq, CUDA, LinearAlgebra
using DiffEqGPU
function lorenz(du,u,p,t)
    @inbounds begin
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
    end
end
u0 = Float32[1.0; 0.0; 0.0]
tspan = (0.0f0, 100.0f0)
p = [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = rand(Float32, 3) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10, saveat = 1.0f0)

my current env:

(JuliaConCUDA) pkg> st
      Status `~/JuliaConCUDA/Project.toml`
  [052768ef] CUDA v3.8.0
  [071ae1c0] DiffEqGPU v1.15.0
  [1dea7af3] OrdinaryDiffEq v6.6.6
  [37e2e46d] LinearAlgebra

and the output after doing:

~/JuliaConCUDA$ julia -g2 --project testDiffs.jl 
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [1] error(::String) at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [2] overdub at ./error.jl:33
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [3] const_arrayref(::CuDeviceMatrix{Float32, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [4] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/utils.jl:49
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [5] getindex(::CUDA.Const{Float32, 2, 1}, ::Int64) at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [6] overdub at /User/homes/lalonso/.julia/packages/CUDA/bki2w/src/device/array.jl:232
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [7] overdub at ./subarray.jl:309
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [8] overdub at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:5
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [9] macro expansion at /User/homes/lalonso/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:20
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [10] overdub at /User/homes/lalonso/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
 [11] overdub at /User/homes/lalonso/.julia/packages/Cassette/1lyEM/src/overdub.jl:0
ERROR: LoadError: KernelException: exception thrown during kernel execution on device Tesla V100-DGXS-16GB
Stacktrace:
  [1] check_exceptions()
    @ CUDA ~/.julia/packages/CUDA/bki2w/src/compiler/exceptions.jl:34
  [2] nonblocking_synchronize
    @ ~/.julia/packages/CUDA/bki2w/lib/cudadrv/context.jl:329 [inlined]
  [3] device_synchronize()
    @ CUDA ~/.julia/packages/CUDA/bki2w/lib/cudadrv/context.jl:317
  [4] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA ~/.julia/packages/CUDA/bki2w/lib/cudadrv/module.jl:41
  [5] CuModule
    @ ~/.julia/packages/CUDA/bki2w/lib/cudadrv/module.jl:23 [inlined]
  [6] cufunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
    @ CUDA ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:451
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1Ajz2/src/cache.jl:95
  [8] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(muladd), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(DiffEqGPU.diffeqgpunorm), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}, Float32, Float32}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:297
  [9] cufunction
    @ ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:291 [inlined]
 [10] macro expansion
    @ ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:102 [inlined]
 [11] #launch_heuristic#268
    @ ~/.julia/packages/CUDA/bki2w/src/gpuarrays.jl:17 [inlined]
 [12] copyto!
    @ ~/.julia/packages/GPUArrays/umZob/src/host/broadcast.jl:65 [inlined]
 [13] copyto!
    @ ./broadcast.jl:936 [inlined]
 [14] materialize!
    @ ./broadcast.jl:894 [inlined]
 [15] materialize!
    @ ./broadcast.jl:891 [inlined]
 [16] fast_materialize!
    @ ~/.julia/packages/FastBroadcast/yCuxg/src/FastBroadcast.jl:31 [inlined]
 [17] ode_determine_initdt(u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float32, tdir::Float32, dtmax::Float32, abstol::Float32, reltol::Float32, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float32, Float32}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Nothing, Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Float32, Float32, Float32, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float32, Float32}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{Float32}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.DEOptions{Float32, Float32, Float32, Float32, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryHeap{Float32, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float32, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, Float32, Tuple{}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Nothing, OrdinaryDiffEq.DefaultInit})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/Op0Oq/src/initdt.jl:23
 [18] auto_dt_reset!
    @ ~/.julia/packages/OrdinaryDiffEq/Op0Oq/src/integrators/integrator_interface.jl:346 [inlined]
 [19] handle_dt!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Nothing, Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Float32, Float32, Float32, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float32, Float32}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{Float32}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.DEOptions{Float32, Float32, Float32, Float32, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryHeap{Float32, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float32, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, Float32, Tuple{}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Nothing, OrdinaryDiffEq.DefaultInit})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/Op0Oq/src/solve.jl:504
 [20] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float32, Float32}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Float32, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float32, dtmin::Nothing, dtmax::Float32, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/Op0Oq/src/solve.jl:466
 [21] #__solve#495
    @ ~/.julia/packages/OrdinaryDiffEq/Op0Oq/src/solve.jl:4 [inlined]
 [22] #solve_call#37
    @ ~/.julia/packages/DiffEqBase/1V2xg/src/solve.jl:61 [inlined]
 [23] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float32, Float32}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(lorenz), typeof(DiffEqGPU.gpu_kernel)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:unstable_check, :saveat, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", Float32, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/1V2xg/src/solve.jl:87
 [24] #solve#38
    @ ~/.julia/packages/DiffEqBase/1V2xg/src/solve.jl:73 [inlined]
 [25] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, typeof(lorenz), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, typeof(lorenz), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:unstable_check, :saveat), Tuple{DiffEqGPU.var"#12#18", Float32}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:319
 [26] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, typeof(lorenz), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:unstable_check, :saveat), Tuple{DiffEqGPU.var"#12#18", Float32}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:284
 [27] macro expansion
    @ ./timing.jl:287 [inlined]
 [28] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, Vector{Float32}, ODEFunction{true, typeof(lorenz), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Symbol, Float32, Tuple{Symbol}, NamedTuple{(:saveat,), Tuple{Float32}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:201
 [29] #solve#40
    @ ~/.julia/packages/DiffEqBase/1V2xg/src/solve.jl:101 [inlined]
 [30] top-level scope
    @ ~/JuliaConCUDA/testDiffs.jl:16
in expression starting at /User/homes/lalonso/JuliaConCUDA/testDiffs.jl:16
ChrisRackauckas commented 2 years ago

@maleadt could I get help on this one? It's really weird. In the package, I changed the code at the spot it's erroring at to:

      Main.x[] = (sk,abstol,u0,t,reltol)
      @.. sk = abstol+internalnorm(u0,t)*reltol

then I did:

using DiffEqGPU, OrdinaryDiffEq
function lorenz(du,u,p,t)
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
end

u0 = Float32[1.0;0.0;0.0]
tspan = (0.0f0,100.0f0)
p = [10.0f0,28.0f0,8/3f0]
prob = ODEProblem(lorenz,u0,tspan,p)
prob_func = (prob,i,repeat) -> remake(prob,p=rand(Float32,3).*p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy=false)
@time sol = solve(monteprob,Tsit5(),EnsembleGPUArray(),trajectories=10_000,saveat=1.0f0,reltol=1f-6,abstol=1f-6
)

x = Ref{Any}()
using CUDA
u = CuArray(rand(Float32,3,8000))
reltol = 1f-6
abstol=1f-6
sk = CuArray(rand(Float32,3,8000))
DiffEqBase.@.. x[][1] = x[][2]+DiffEqBase.ODE_DEFAULT_NORM(x[][3],x[][4])*x[][5]

so magically, the same expression is working at the top level scope, but I get a kernel compilation error when that expression with the same arguments is in the package. Do you know what could cause this behavior?

ChrisRackauckas commented 2 years ago

It was a version dependency issue. Fixed by updating the KernelAbstractions versions.

lazarusA commented 2 years ago

Thanks. It works now.