SciML / OrdinaryDiffEq.jl

High performance ordinary differential equation (ODE) and differential-algebraic equation (DAE) solvers, including neural ordinary differential equations (neural ODEs) and scientific machine learning (SciML)
https://diffeq.sciml.ai/latest/
Other
536 stars 205 forks source link

Unexpected memory accumulation when repeatedly running `solve` #2147

Closed thanasibakis closed 7 months ago

thanasibakis commented 7 months ago

Describe the bug 🐞

Running solve repeatedly seems to cause a julia program to accumulate an unexpected amount of memory over time (as reported by htop). If a place a call to solve in a while true loop, I am able to watch the MEM% column of htop gradually grow.

I have reproduced this issue on julia versions 1.8, 1.9, and 1.10.

Expected behavior

I would expect the memory to be reused each iteration of the loop (or some nuanced version of that idea). I would expect to be able to run this infinite loop without eventually receiving an Out of Memory error from my operating system, on a machine with tens of gigabytes of RAM.

Minimal Reproducible Example 👇

Run with julia --threads=8. Multithreading is not necessary to reproduce, but accelerates the demonstration of the issue.

Adapted from https://docs.sciml.ai/DiffEqDocs/stable/getting_started/#ode_other_types

using DifferentialEquations, LinearAlgebra

f(du, u, A, t) = mul!(du, A, u)

function main()
    A = [ 1.0  0 0 -5
          4   -2 4 -3
         -4    0 0  1
          5   -2 2  3]

    Threads.@threads for i in 1:8
        while true
            soln = solve(
                ODEProblem{true}(
                    f,
                    rand(4, 2),
                    (1.0, 0.0),
                    A
                ),
                Tsit5();
                save_everystep = false,
                save_start = false,
                reltol = 1e-3,
                abstol = 1e-3
            )
        end
    end
end

main()

Error & Stacktrace ⚠️

None from julia. However, monitoring the MEM% column of htop for the julia process will show a gradually accumulating memory usage.

Environment (please complete the following information):

Status `~/.julia/environments/v1.8/Project.toml`
⌅ [0c46a032] DifferentialEquations v7.10.0
  [37e2e46d] LinearAlgebra
Status `~/.julia/environments/v1.8/Manifest.toml`
  [47edcb42] ADTypes v0.2.6
⌅ [79e6a3ab] Adapt v3.7.2
⌅ [ec485272] ArnoldiMethod v0.2.0
⌅ [4fba245c] ArrayInterface v7.5.1
  [4c555306] ArrayLayouts v1.6.0
⌅ [aae01518] BandedMatrices v0.17.38
  [6e4b80f9] BenchmarkTools v1.4.0
  [62783981] BitTwiddlingConvenienceFunctions v0.1.5
⌅ [764a87c0] BoundaryValueDiffEq v4.0.1
  [fa961155] CEnum v0.5.0
  [2a0fbf3d] CPUSummary v0.2.4
  [49dc2e85] Calculus v0.5.1
  [d360d2e6] ChainRulesCore v1.22.0
  [9e997f8a] ChangesOfVariables v0.1.8
  [fb6a15b2] CloseOpenIntervals v0.1.12
  [523fee87] CodecBzip2 v0.8.2
  [944b1d66] CodecZlib v0.7.4
  [38540f10] CommonSolve v0.2.4
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v4.13.0
  [2569d6c7] ConcreteStructs v0.2.3
  [187b0558] ConstructionBase v1.5.4
  [adafc99b] CpuId v0.3.1
  [9a962f9c] DataAPI v1.16.0
  [864edb3b] DataStructures v0.18.17
  [e2d170a0] DataValueInterfaces v1.0.0
⌅ [bcd4f6db] DelayDiffEq v5.44.0
  [b429d917] DensityInterface v0.4.0
⌅ [2b5f629d] DiffEqBase v6.130.0
⌅ [459566f4] DiffEqCallbacks v2.35.0
⌅ [77a26b50] DiffEqNoiseProcess v5.19.0
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
⌅ [0c46a032] DifferentialEquations v7.10.0
  [b4f34e82] Distances v0.10.11
  [31c24e10] Distributions v0.25.107
  [ffbed154] DocStringExtensions v0.9.3
  [fa6b7ba4] DualNumbers v0.6.8
  [4e289a0a] EnumX v1.0.4
  [f151be2c] EnzymeCore v0.6.5
⌅ [d4d017d3] ExponentialUtilities v1.25.0
  [e2ba6199] ExprTools v0.1.10
  [7034ab61] FastBroadcast v0.2.8
  [9aa1b823] FastClosures v0.3.2
  [29a986be] FastLapackInterface v2.0.1
  [1a297f60] FillArrays v1.9.3
  [6a86dc24] FiniteDiff v2.22.0
  [f6369f11] ForwardDiff v0.10.36
  [069b7b12] FunctionWrappers v1.1.3
  [77dc65aa] FunctionWrappersWrappers v0.1.3
  [d9f16b24] Functors v0.4.7
⌃ [46192b85] GPUArraysCore v0.1.5
  [c145ed77] GenericSchur v0.5.3
  [86223c79] Graphs v1.9.0
  [3e5b6fbb] HostCPUFeatures v0.1.16
  [34004b35] HypergeometricFunctions v0.3.23
  [615f187c] IfElse v0.1.1
  [d25df0c9] Inflate v0.1.4
  [3587e190] InverseFunctions v0.1.12
  [92d709cd] IrrationalConstants v0.2.2
  [82899510] IteratorInterfaceExtensions v1.0.0
  [692b3bcd] JLLWrappers v1.5.0
  [682c06a0] JSON v0.21.4
  [ccbc3e58] JumpProcesses v9.10.1
⌅ [ef3ab10e] KLU v0.4.1
  [ba0b0d4f] Krylov v0.9.5
  [10f19ff3] LayoutPointers v0.1.15
  [50d2b5c4] Lazy v0.15.1
  [2d8b4e74] LevyArea v1.0.0
  [d3d80556] LineSearches v7.2.0
⌅ [7ed4a6bd] LinearSolve v2.14.1
  [2ab3a3ac] LogExpFunctions v0.3.27
  [bdcacae8] LoopVectorization v0.12.166
  [1914dd2f] MacroTools v0.5.13
  [d125e4d3] ManualMemory v0.1.8
  [b8f27783] MathOptInterface v1.25.3
  [e1d29d7a] Missings v1.1.0
  [46d2c3a1] MuladdMacro v0.2.4
  [d8a4904e] MutableArithmetics v1.4.1
  [d41bc354] NLSolversBase v7.8.3
  [2774e3e8] NLsolve v4.5.1
  [77ba4419] NaNMath v1.0.2
⌅ [8913a72c] NonlinearSolve v1.10.1
  [6fe1bfb0] OffsetArrays v1.13.0
  [429524aa] Optim v1.9.2
  [bac558e1] OrderedCollections v1.6.3
⌅ [1dea7af3] OrdinaryDiffEq v6.58.2
  [90014a1f] PDMats v0.11.31
  [65ce6f38] PackageExtensionCompat v1.0.2
  [d96e819e] Parameters v0.12.3
  [69de0a69] Parsers v2.8.1
  [e409e4f3] PoissonRandom v0.4.4
  [f517fe37] Polyester v0.7.9
  [1d0040c9] PolyesterWeave v0.2.1
  [85a6dd25] PositiveFactorizations v0.2.4
⌅ [d236fae5] PreallocationTools v0.4.12
  [aea7be01] PrecompileTools v1.2.0
  [21216c6a] Preferences v1.4.1
  [1fd47b50] QuadGK v2.9.4
  [74087812] Random123 v1.7.0
  [e6cf234a] RandomNumbers v1.5.3
  [3cdcf5f2] RecipesBase v1.3.4
⌅ [731186ca] RecursiveArrayTools v2.38.10
  [f2c3362d] RecursiveFactorization v0.2.21
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [ae5879a3] ResettableStacks v1.1.1
  [79098fc4] Rmath v0.7.1
  [7e49a35a] RuntimeGeneratedFunctions v0.5.12
  [94e857df] SIMDTypes v0.1.0
  [476501e8] SLEEFPirates v0.6.42
⌅ [0bca4576] SciMLBase v1.98.1
  [e9a6253c] SciMLNLSolve v0.1.9
  [c0aeaf25] SciMLOperators v0.3.7
  [efcf1570] Setfield v1.1.1
⌅ [727e6d20] SimpleNonlinearSolve v0.1.23
  [699a6c99] SimpleTraits v0.9.4
  [ce78b400] SimpleUnPack v1.1.0
  [a2af1166] SortingAlgorithms v1.2.1
  [47a9eef4] SparseDiffTools v2.17.0
  [e56a9233] Sparspak v0.3.9
  [276daf66] SpecialFunctions v2.3.1
⌅ [aedffcd0] Static v0.8.9
  [0d7ed370] StaticArrayInterface v1.5.0
  [90137ffa] StaticArrays v1.9.3
  [1e83bf80] StaticArraysCore v1.4.2
  [82ae8749] StatsAPI v1.7.0
  [2913bbd2] StatsBase v0.34.2
  [4c63d2b9] StatsFuns v1.3.1
⌅ [9672c7b4] SteadyStateDiffEq v1.16.1
⌃ [789caeaf] StochasticDiffEq v6.62.0
  [7792a7ef] StrideArraysCore v0.5.2
⌅ [c3572dad] Sundials v4.20.1
⌅ [2efcf032] SymbolicIndexingInterface v0.2.2
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.11.1
  [8290d209] ThreadingUtilities v0.5.2
  [3bb67fe8] TranscodingStreams v0.10.3
  [d5829a12] TriangularSolve v0.1.20
  [410a4b4d] Tricks v0.1.8
  [781d530d] TruncatedStacktraces v1.4.0
  [3a884ed6] UnPack v1.0.2
  [3d5dd08c] VectorizationBase v0.21.65
  [19fa3120] VertexSafeGraphs v0.2.0
  [700de1a5] ZygoteRules v0.2.5
  [6e34b625] Bzip2_jll v1.0.8+1
  [1d5cc7b8] IntelOpenMP_jll v2024.0.2+0
  [856f044c] MKL_jll v2024.0.0+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [f50d1b31] Rmath_jll v0.4.0+0
⌅ [fb77eaff] Sundials_jll v5.2.1+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.3
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.8.0
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.0
  [a4e569a6] Tar v1.10.1
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.0.1+0
  [deac9b47] LibCURL_jll v7.84.0+0
  [29816b5a] LibSSH2_jll v1.10.2+0
  [c8ffd9c3] MbedTLS_jll v2.28.0+0
  [14a3606d] MozillaCACerts_jll v2022.2.1
  [4536629a] OpenBLAS_jll v0.3.20+0
  [05823500] OpenLibm_jll v0.8.1+0
  [bea87d4a] SuiteSparse_jll v5.10.1+0
  [83775a58] Zlib_jll v1.2.12+3
  [8e850b90] libblastrampoline_jll v5.1.1+0
  [8e850ede] nghttp2_jll v1.48.0+0
  [3f19e933] p7zip_jll v17.4.0+0
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 80 × Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
  Threads: 1 on 80 virtual cores
Environment:
  LD_RUN_PATH = /pkg/slurm/21.08.5/lib:/pkg/gcc/8.3.0/lib
  LD_LIBRARY_PATH = /pkg/openjdk/17.0.2/lib:/pkg/slurm/21.08.5/lib:/pkg/gcc/8.3.0/lib64:/pkg/gcc/8.3.0/lib:/pkg/R/4.2.1-centos7/lib64/R/lib:/pkg/python/3.10.0/lib
  LD_RUN_PATH_modshare = /pkg/slurm/21.08.5/lib:1:/pkg/gcc/8.3.0/lib:1
  LD_LIBRARY_PATH_modshare = /pkg/slurm/21.08.5/lib:1:/pkg/gcc/8.3.0/lib:1:/pkg/openjdk/17.0.2/lib:1:/pkg/gcc/8.3.0/lib64:1:/pkg/R/4.2.1-centos7/lib64/R/lib:1:/pkg/python/3.10.0/lib:1

Additional context

Add any other context about the problem here.

oscardssmith commented 7 months ago

There's a dramatically reduced version of the MWE.

using OrdinaryDiffEq
function main()
    prob = ODEProblem((u,p,t)->u, 1.0, (0.0, 1.0))
    while true
        init(prob, Tsit5())
    end
end

Specifically we seem to be leaking the name of an anonymous function somewhere in the initialization. This memory is permalloced so it doesn't show up in heap size tracking, but does show in top.