SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
272 stars 27 forks source link

Tutorial fails using Metal.jl #315

Open ctessum opened 7 months ago

ctessum commented 7 months ago

Hi,

I am trying to run this tutorial on my laptop, which has an M1 processor. My understanding is that to do this, I should just change CUDA to Metal:

using DiffEqGPU, DifferentialEquations, StaticArrays, Metal

function lorenz2(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz2, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(Metal.MetalBackend()),
    trajectories = 10_000,
    saveat = 1.0f0)

However, when I run the code above, the last line gives the error:

ERROR: InvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_ode_asolve_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::MtlDeviceVector{DiffEqGPU.ImmutableODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz2), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, ::GPUTsit5, ::MtlDeviceMatrix{SVector{3, Float32}, 1}, ::MtlDeviceMatrix{Float32, 1}, ::Float32, ::CallbackSet{Tuple{}, Tuple{}}, ::Nothing, ::Float32, ::Float32, ::StepRangeLen{Float32, Float64, Float64, Int64}, ::Val{false}) resulted in invalid LLVM IR
Reason: unsupported use of double value
Reason: unsupported use of double value
Reason: unsupported use of double value

These are the package versions:

(esml_demo) pkg> status DiffEqGPU
  [071ae1c0] DiffEqGPU v3.3.0
(esml_demo) pkg> status Metal
  [dde4c033] Metal v0.5.1
(esml_demo) pkg> status DifferentialEquations
  [0c46a032] DifferentialEquations v7.11.0

Is this the expected behavior?

ctessum commented 7 months ago

More information in case relevant:

Metal.versioninfo()

macOS 14.0.0, Darwin 23.0.0

Toolchain:
- Julia: 1.9.0
- LLVM: 14.0.6

Julia packages: 
- Metal.jl: 0.5.1
- Metal_LLVM_Tools_jll: 0.5.1+0

1 device:
- Apple M1 (2.406 MiB allocated)
utkarsh530 commented 7 months ago

The Apple M1 does not support Float64 values yet, which is causing some issues with type ::StepRangeLen{Float32, Float64, Float64, Int64} (it turns out some Float64 happens with your CPU's precision). If you remove saveat=1.0f0, it should work.

I am trying to fix it using #317. Thanks for bringing it up!

ggkountouras commented 5 days ago

I'm getting a different error with the previous tutorial (no saveat). Scaling down the parameters p seems to make it go away. The size of the problem doesn't affect the error, since even trajectories=2 fails with:

Error: No solution found
│   tspan = 0.0f0
│   ts =
│    2-element view(::Matrix{Float32}, :, 1) with eltype Float32:
│     0.0
│     0.0
└ @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:175
ERROR: Batch solve failed
Code

```julia using DiffEqGPU, OrdinaryDiffEq, StaticArrays, Metal function lorenz(u, p, t) σ = p[1] ρ = p[2] β = p[3] du1 = σ * (u[2] - u[1]) du2 = u[1] * (ρ - u[3]) - u[2] du3 = u[1] * u[2] - β * u[3] return SVector{3}(du1, du2, du3) end u0 = @SVector [1.0f0; 0.0f0; 0.0f0] tspan = (0.0f0, 10.0f0) p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0] prob = ODEProblem{false}(lorenz, u0, tspan, p) prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p) # this fails #prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p .* 0.1f0) # this works monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false) sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(Metal.MetalBackend()), trajectories = 10_000) ```

Complete error

``` 1-element ExceptionStack: LoadError: Batch solve failed Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] #126 @ ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:176 [inlined] [3] (::DiffEqGPU.var"#126#142"{EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, GPUTsit5, Matrix{Float32}})(i::Int64) @ DiffEqGPU ./none:0 [4] iterate @ ./generator.jl:47 [inlined] [5] collect(itr::Base.Generator{Base.OneTo{Int64}, DiffEqGPU.var"#126#142"{EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, GPUTsit5, Matrix{Float32}}}) @ Base ./array.jl:834 [6] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{MetalBackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::@Kwargs{unstable_check::DiffEqGPU.var"#114#120"}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:170 [7] macro expansion @ ./timing.jl:395 [inlined] [8] __solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{MetalBackend}; trajectories::Int64, batch_size::Int64, unstable_check::Function, adaptive::Bool, kwargs::@Kwargs{}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:55 [9] __solve @ ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:1 [inlined] [10] #solve#45 @ ~/.julia/packages/DiffEqBase/52czI/src/solve.jl:1096 [inlined] [11] top-level scope @ ~/Documents/dev/julia-diffeqgpu/stress_test.jl:21 [12] eval @ ./boot.jl:385 [inlined] [13] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String) @ Base ./loading.jl:2076 [14] include_string(m::Module, txt::String, fname::String) @ Base ./loading.jl:2086 [15] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{}) @ Base ./essentials.jl:892 [16] invokelatest(::Any, ::Any, ::Vararg{Any}) @ Base ./essentials.jl:889 [17] inlineeval(m::Module, code::String, code_line::Int64, code_column::Int64, file::String; softscope::Bool) @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:271 [18] (::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})() @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:181 [19] withpath(f::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}, path::String) @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/repl.jl:276 [20] (::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})() @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:179 [21] hideprompt(f::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}) @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/repl.jl:38 [22] (::VSCodeServer.var"#67#72"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})() @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:150 [23] with_logstate(f::Function, logstate::Any) @ Base.CoreLogging ./logging.jl:515 [24] with_logger @ ./logging.jl:627 [inlined] [25] (::VSCodeServer.var"#66#71"{VSCodeServer.ReplRunCodeRequestParams})() @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:263 [26] #invokelatest#2 @ ./essentials.jl:892 [inlined] [27] invokelatest(::Any) @ Base ./essentials.jl:889 [28] (::VSCodeServer.var"#64#65")() @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:34 in expression starting at /Users/georgegkountouras/Documents/dev/julia-diffeqgpu/stress_test.jl:21 ```

Package versions

``` Status `~/Documents/dev/julia-diffeqgpu/Manifest.toml` ⌅ [47edcb42] ADTypes v0.2.7 ⌅ [79e6a3ab] Adapt v3.7.2 [ec485272] ArnoldiMethod v0.4.0 ⌃ [4fba245c] ArrayInterface v7.7.1 [4c555306] ArrayLayouts v1.10.0 [a9b6321e] Atomix v0.1.0 [6e4b80f9] BenchmarkTools v1.5.0 [62783981] BitTwiddlingConvenienceFunctions v0.1.5 ⌅ [fa961155] CEnum v0.4.2 [2a0fbf3d] CPUSummary v0.2.5 [d360d2e6] ChainRulesCore v1.24.0 [fb6a15b2] CloseOpenIntervals v0.1.12 [38540f10] CommonSolve v0.2.4 [bbf7d656] CommonSubexpressions v0.3.0 [34da2185] Compat v4.15.0 [2569d6c7] ConcreteStructs v0.2.3 [187b0558] ConstructionBase v1.5.5 [adafc99b] CpuId v0.3.1 [9a962f9c] DataAPI v1.16.0 [864edb3b] DataStructures v0.18.20 [e2d170a0] DataValueInterfaces v1.0.0 ⌃ [2b5f629d] DiffEqBase v6.147.3 [071ae1c0] DiffEqGPU v3.4.1 [163ba53b] DiffResults v1.1.0 [b552c78f] DiffRules v1.15.1 [ffbed154] DocStringExtensions v0.9.3 [4e289a0a] EnumX v1.0.4 ⌃ [f151be2c] EnzymeCore v0.6.6 [d4d017d3] ExponentialUtilities v1.26.1 [e2ba6199] ExprTools v0.1.10 ⌅ [7034ab61] FastBroadcast v0.2.8 [9aa1b823] FastClosures v0.3.2 [29a986be] FastLapackInterface v2.0.4 [1a297f60] FillArrays v1.11.0 [6a86dc24] FiniteDiff v2.23.1 [f6369f11] ForwardDiff v0.10.36 [069b7b12] FunctionWrappers v1.1.3 [77dc65aa] FunctionWrappersWrappers v0.1.3 ⌅ [0c68f7d7] GPUArrays v9.1.0 ⌅ [46192b85] GPUArraysCore v0.1.5 ⌅ [61eb1bfa] GPUCompiler v0.24.5 [c145ed77] GenericSchur v0.5.4 [86223c79] Graphs v1.11.1 [3e5b6fbb] HostCPUFeatures v0.1.16 [615f187c] IfElse v0.1.1 [d25df0c9] Inflate v0.1.5 [92d709cd] IrrationalConstants v0.2.2 [82899510] IteratorInterfaceExtensions v1.0.0 [692b3bcd] JLLWrappers v1.5.0 [682c06a0] JSON v0.21.4 ⌅ [ef3ab10e] KLU v0.4.1 ⌃ [63c18a36] KernelAbstractions v0.9.18 [ba0b0d4f] Krylov v0.9.6 ⌅ [929cbde3] LLVM v6.6.3 [10f19ff3] LayoutPointers v0.1.15 ⌅ [5078a376] LazyArrays v1.10.0 [d3d80556] LineSearches v7.2.0 ⌃ [7ed4a6bd] LinearSolve v2.22.1 [2ab3a3ac] LogExpFunctions v0.3.28 [bdcacae8] LoopVectorization v0.12.170 [1914dd2f] MacroTools v0.5.13 [d125e4d3] ManualMemory v0.1.8 ⌅ [a3b82374] MatrixFactorizations v2.2.0 [bb5d69b7] MaybeInplace v0.1.3 ⌃ [dde4c033] Metal v0.5.1 [46d2c3a1] MuladdMacro v0.2.4 [d41bc354] NLSolversBase v7.8.3 [77ba4419] NaNMath v1.0.2 ⌃ [8913a72c] NonlinearSolve v3.8.3 [d8793406] ObjectFile v0.4.1 ⌅ [e86c9b32] ObjectiveC v1.1.0 [6fe1bfb0] OffsetArrays v1.14.0 [bac558e1] OrderedCollections v1.6.3 ⌃ [1dea7af3] OrdinaryDiffEq v6.80.1 [65ce6f38] PackageExtensionCompat v1.0.2 [d96e819e] Parameters v0.12.3 [69de0a69] Parsers v2.8.1 [f517fe37] Polyester v0.7.14 [1d0040c9] PolyesterWeave v0.2.1 [d236fae5] PreallocationTools v0.4.22 [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [3cdcf5f2] RecipesBase v1.3.4 ⌃ [731186ca] RecursiveArrayTools v3.13.0 [f2c3362d] RecursiveFactorization v0.2.23 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [7e49a35a] RuntimeGeneratedFunctions v0.5.13 [94e857df] SIMDTypes v0.1.0 [476501e8] SLEEFPirates v0.6.42 ⌃ [0bca4576] SciMLBase v2.31.0 [c0aeaf25] SciMLOperators v0.3.8 ⌃ [53ae85a6] SciMLStructures v1.2.0 [6c6a2e73] Scratch v1.2.1 [efcf1570] Setfield v1.1.1 [05bca326] SimpleDiffEq v1.11.1 ⌃ [727e6d20] SimpleNonlinearSolve v1.6.0 [699a6c99] SimpleTraits v0.9.4 [ce78b400] SimpleUnPack v1.1.0 ⌃ [47a9eef4] SparseDiffTools v2.18.0 [e56a9233] Sparspak v0.3.9 [276daf66] SpecialFunctions v2.4.0 [aedffcd0] Static v0.8.10 [0d7ed370] StaticArrayInterface v1.5.0 [90137ffa] StaticArrays v1.9.5 [1e83bf80] StaticArraysCore v1.4.3 [7792a7ef] StrideArraysCore v0.5.6 [53d494c1] StructIO v0.3.0 ⌃ [2efcf032] SymbolicIndexingInterface v0.3.11 [3783bdb8] TableTraits v1.0.1 [bd369af6] Tables v1.11.1 [8290d209] ThreadingUtilities v0.5.2 [a759f4b9] TimerOutputs v0.5.24 [d5829a12] TriangularSolve v0.2.0 [410a4b4d] Tricks v0.1.8 [781d530d] TruncatedStacktraces v1.4.0 [3a884ed6] UnPack v1.0.2 [013be700] UnsafeAtomics v0.2.1 [d80eeb9a] UnsafeAtomicsLLVM v0.1.4 [3d5dd08c] VectorizationBase v0.21.68 [19fa3120] VertexSafeGraphs v0.2.0 [700de1a5] ZygoteRules v0.2.5 [6e34b625] Bzip2_jll v1.0.8+1 [2e619515] Expat_jll v2.6.2+0 [1d5cc7b8] IntelOpenMP_jll v2024.1.0+0 ⌅ [dad2f222] LLVMExtra_jll v0.0.29+0 [7106de7a] LibMPDec_jll v2.5.1+0 ⌅ [e9f186c6] Libffi_jll v3.2.2+1 [856f044c] MKL_jll v2024.1.0+0 [0418c028] Metal_LLVM_Tools_jll v0.5.1+0 [458c3c95] OpenSSL_jll v3.0.14+0 [efe28fd5] OpenSpecFun_jll v0.5.5+0 [93d3a430] Python_jll v3.10.14+0 [76ed43ae] SQLite_jll v3.45.3+0 [ffd25f8a] XZ_jll v5.4.6+0 [1317d2d5] oneTBB_jll v2021.12.0+0 [0dad84c5] ArgTools v1.1.1 [56f22d72] Artifacts [2a0f44e3] Base64 [ade2ca70] Dates [8ba89e20] Distributed [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching [9fa8497b] Future [b77e0a4c] InteractiveUtils [4af54fe1] LazyArtifacts [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 [8f399da3] Libdl [37e2e46d] LinearAlgebra [56ddb016] Logging [d6f4376e] Markdown [a63ad114] Mmap [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.10.0 [de0858da] Printf [9abbd945] Profile [3fa0cd96] REPL [9a3f8284] Random [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization [1a1011a3] SharedArrays [6462fe0b] Sockets [2f01184e] SparseArrays v1.10.0 [10745b16] Statistics v1.10.0 [4607b0f0] SuiteSparse [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test [cf7118a7] UUIDs [4ec0a83e] Unicode [e66e0078] CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] LibCURL_jll v8.4.0+0 [e37daf67] LibGit2_jll v1.6.4+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.2+1 [14a3606d] MozillaCACerts_jll v2023.1.10 [4536629a] OpenBLAS_jll v0.3.23+4 [05823500] OpenLibm_jll v0.8.1+2 [bea87d4a] SuiteSparse_jll v7.2.1+1 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.8.0+1 [8e850ede] nghttp2_jll v1.52.0+1 [3f19e933] p7zip_jll v17.4.0+2 ```

Metal.versioninfo()

``` macOS 14.6.0, Darwin 23.6.0 Toolchain: - Julia: 1.10.4 - LLVM: 15.0.7 Julia packages: - Metal.jl: 0.5.1 - Metal_LLVM_Tools_jll: 0.5.1+0 1 device: - Apple M1 Max (1.625 MiB allocated) ```