SciML / OrdinaryDiffEq.jl

High performance ordinary differential equation (ODE) and differential-algebraic equation (DAE) solvers, including neural ordinary differential equations (neural ODEs) and scientific machine learning (SciML)
https://diffeq.sciml.ai/latest/
Other
536 stars 205 forks source link

`unstable_check` gets passed value that's far from solution #2179

Closed laikq closed 5 months ago

laikq commented 5 months ago

Describe the bug 🐞

Under very specific conditions, unstable_check gets passed values that are far from the actual solution, leading to an unwarranted abortion of the computation.

Expected behavior

unstable_check should only get passed values that later also appear in the solution of the differential equation.

Minimal Reproducible Example 👇

using OrdinaryDiffEq

struct ParabParams
    α::Float64
    ε::Float64
end

function parab_rule(u, p, t)
    x = u
    α = p.α
    ε = p.ε
    x^2 - 2x*√α + ε*cos(π*t)
end

function parab_unstable_check(dt, u, p, t)
    α = p.α
    ε = p.ε
    # this condition is a crude approximation for “this system has tipped irrecoverably”
    cond = u > 3 + 3α + ε
    if cond
        @debug "Encountered condition @ " u dt t
    end
    # ↓ replace by `cond` and the computation will abort prematurely
    isnan(u)
end

function doit(params)
    prob = ODEProblem(parab_rule, 0., (0., 500), params)
    sol = solve(prob, Tsit5();
        saveat=0.02, reltol=1e-4, abstol=1e-7, unstable_check=parab_unstable_check)
    @info "Maximum of solution is $(maximum(sol.u))"
end

doit(ParabParams(1, 3.374821173104435))  # this shows the erroneous behavior
# Output:
# Encountered condition @
# u 16.526948027196813
# dt 0.30287695167771833
# t 147.17216737552005
# Maximum of solution is 1.2421651640163502
doit(ParabParams(1, 3.37482118))  # this behaves as expected
# Output:
# Maximum of solution is 1.2421642587809048

Environment (please complete the following information):

Status `/tmp/jl_OyS6gD/Project.toml`
  [1dea7af3] OrdinaryDiffEq v6.74.1
  [44cfe95a] Pkg v1.10.0
Output of `using Pkg; Pkg.status(; mode = PKGMODE_MANIFEST)` ```julia Status `/tmp/jl_OyS6gD/Manifest.toml` ⌅ [47edcb42] ADTypes v0.2.7 [7d9f7c33] Accessors v0.1.36 [79e6a3ab] Adapt v4.0.4 [ec485272] ArnoldiMethod v0.4.0 [4fba245c] ArrayInterface v7.10.0 [4c555306] ArrayLayouts v1.9.2 [62783981] BitTwiddlingConvenienceFunctions v0.1.5 [2a0fbf3d] CPUSummary v0.2.4 [d360d2e6] ChainRulesCore v1.23.0 [fb6a15b2] CloseOpenIntervals v0.1.12 [38540f10] CommonSolve v0.2.4 [bbf7d656] CommonSubexpressions v0.3.0 [34da2185] Compat v4.14.0 [a33af91c] CompositionsBase v0.1.2 [2569d6c7] ConcreteStructs v0.2.3 [187b0558] ConstructionBase v1.5.5 [adafc99b] CpuId v0.3.1 [9a962f9c] DataAPI v1.16.0 [864edb3b] DataStructures v0.18.20 [e2d170a0] DataValueInterfaces v1.0.0 [2b5f629d] DiffEqBase v6.149.1 [163ba53b] DiffResults v1.1.0 [b552c78f] DiffRules v1.15.1 [ffbed154] DocStringExtensions v0.9.3 [4e289a0a] EnumX v1.0.4 ⌃ [f151be2c] EnzymeCore v0.6.6 [d4d017d3] ExponentialUtilities v1.26.1 [e2ba6199] ExprTools v0.1.10 [7034ab61] FastBroadcast v0.2.8 [9aa1b823] FastClosures v0.3.2 [29a986be] FastLapackInterface v2.0.2 [1a297f60] FillArrays v1.10.2 [6a86dc24] FiniteDiff v2.23.1 [f6369f11] ForwardDiff v0.10.36 [069b7b12] FunctionWrappers v1.1.3 [77dc65aa] FunctionWrappersWrappers v0.1.3 [46192b85] GPUArraysCore v0.1.6 [c145ed77] GenericSchur v0.5.4 [86223c79] Graphs v1.10.0 [3e5b6fbb] HostCPUFeatures v0.1.16 [615f187c] IfElse v0.1.1 [d25df0c9] Inflate v0.1.4 [3587e190] InverseFunctions v0.1.14 [92d709cd] IrrationalConstants v0.2.2 [82899510] IteratorInterfaceExtensions v1.0.0 [692b3bcd] JLLWrappers v1.5.0 [ef3ab10e] KLU v0.6.0 [ba0b0d4f] Krylov v0.9.5 [10f19ff3] LayoutPointers v0.1.15 [5078a376] LazyArrays v1.10.0 [d3d80556] LineSearches v7.2.0 [7ed4a6bd] LinearSolve v2.29.1 [2ab3a3ac] LogExpFunctions v0.3.27 [bdcacae8] LoopVectorization v0.12.170 [1914dd2f] MacroTools v0.5.13 [d125e4d3] ManualMemory v0.1.8 [a3b82374] MatrixFactorizations v2.2.0 [bb5d69b7] MaybeInplace v0.1.2 [46d2c3a1] MuladdMacro v0.2.4 [d41bc354] NLSolversBase v7.8.3 [77ba4419] NaNMath v1.0.2 [8913a72c] NonlinearSolve v3.10.0 [6fe1bfb0] OffsetArrays v1.14.0 [bac558e1] OrderedCollections v1.6.3 [1dea7af3] OrdinaryDiffEq v6.74.1 [65ce6f38] PackageExtensionCompat v1.0.2 [d96e819e] Parameters v0.12.3 [f517fe37] Polyester v0.7.13 [1d0040c9] PolyesterWeave v0.2.1 [d236fae5] PreallocationTools v0.4.21 [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [3cdcf5f2] RecipesBase v1.3.4 [731186ca] RecursiveArrayTools v3.14.0 [f2c3362d] RecursiveFactorization v0.2.23 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [7e49a35a] RuntimeGeneratedFunctions v0.5.13 [94e857df] SIMDTypes v0.1.0 [476501e8] SLEEFPirates v0.6.42 [0bca4576] SciMLBase v2.35.0 [c0aeaf25] SciMLOperators v0.3.8 [53ae85a6] SciMLStructures v1.1.0 [efcf1570] Setfield v1.1.1 [727e6d20] SimpleNonlinearSolve v1.7.0 [699a6c99] SimpleTraits v0.9.4 [ce78b400] SimpleUnPack v1.1.0 ⌃ [47a9eef4] SparseDiffTools v2.18.0 [e56a9233] Sparspak v0.3.9 [276daf66] SpecialFunctions v2.3.1 [aedffcd0] Static v0.8.10 [0d7ed370] StaticArrayInterface v1.5.0 [90137ffa] StaticArrays v1.9.3 [1e83bf80] StaticArraysCore v1.4.2 [7792a7ef] StrideArraysCore v0.5.6 [2efcf032] SymbolicIndexingInterface v0.3.19 [3783bdb8] TableTraits v1.0.1 [bd369af6] Tables v1.11.1 [8290d209] ThreadingUtilities v0.5.2 [a759f4b9] TimerOutputs v0.5.23 [d5829a12] TriangularSolve v0.2.0 [410a4b4d] Tricks v0.1.8 [781d530d] TruncatedStacktraces v1.4.0 [3a884ed6] UnPack v1.0.2 [3d5dd08c] VectorizationBase v0.21.67 [19fa3120] VertexSafeGraphs v0.2.0 [1d5cc7b8] IntelOpenMP_jll v2024.1.0+0 [856f044c] MKL_jll v2024.1.0+0 [efe28fd5] OpenSpecFun_jll v0.5.5+0 [1317d2d5] oneTBB_jll v2021.12.0+0 [0dad84c5] ArgTools v1.1.1 [56f22d72] Artifacts [2a0f44e3] Base64 [ade2ca70] Dates [8ba89e20] Distributed [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching [9fa8497b] Future [b77e0a4c] InteractiveUtils [4af54fe1] LazyArtifacts [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 [8f399da3] Libdl [37e2e46d] LinearAlgebra [56ddb016] Logging [d6f4376e] Markdown [a63ad114] Mmap [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.10.0 [de0858da] Printf [3fa0cd96] REPL [9a3f8284] Random [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization [1a1011a3] SharedArrays [6462fe0b] Sockets [2f01184e] SparseArrays v1.10.0 [10745b16] Statistics v1.10.0 [4607b0f0] SuiteSparse [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test [cf7118a7] UUIDs [4ec0a83e] Unicode [e66e0078] CompilerSupportLibraries_jll v1.1.0+0 [deac9b47] LibCURL_jll v8.4.0+0 [e37daf67] LibGit2_jll v1.6.4+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.2+1 [14a3606d] MozillaCACerts_jll v2023.1.10 [4536629a] OpenBLAS_jll v0.3.23+4 [05823500] OpenLibm_jll v0.8.1+2 [bea87d4a] SuiteSparse_jll v7.2.1+1 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.8.0+1 [8e850ede] nghttp2_jll v1.52.0+1 [3f19e933] p7zip_jll v17.4.0+2 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m` ```
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, ivybridge)
Threads: 4 default, 0 interactive, 2 GC (on 4 virtual cores)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_PROJECT = @.
  LD_LIBRARY_PATH = /etc/sane-libs

Julia binary taken from Nixpkgs repository.

Additional context

Unfortunately I don't really have a clue as to why this happens. But I know that mathematically, it makes no sense that x(t) > 16 for a small time for these parameters. For ε ≤ 4.5, the solution is stable and should not cause parab_unstable_check to fire.

Analytical background:

oscardssmith commented 5 months ago

This doesn't reproduce for me. I think the fact that you are on an ivy bridge cpu might have something to do with it though (ivy bridge is the last of the chips without fused multiply add instructions).

Edit: I'm also not able to reproduce this when running with julia -C sandybridge so I don't think the CPU is the issue. Now that I see "Julia binary taken from Nixpkgs repository." my guess is that Nix is shipping a broken version of Julia. Does this reproduce if you install Julia from the official downloads or Juliaup?

ChrisRackauckas commented 5 months ago

Nix is known to have a broken build because it uses the wrong llvm but they do a stupid patch to make it say it's official when we have tried to remove it for years. Please use a real binary from julialang.org/download and it should be good.

laikq commented 5 months ago

@oscardssmith Thanks for trying to reproduce. I have downloaded the Julia 1.10.3 binary from the official downloads (but without Juliaup), and the error unfortunately still persists. Not sure if this is relevant, but since the binary didn't run on its own (I suppose missing dynamic libraries), I used nix-alien to make it work.

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, ivybridge)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  LD_LIBRARY_PATH = /run/opengl-driver/lib:/run/opengl-driver-32/lib:/etc/sane-libs

@ChrisRackauckas I have looked into the build description of julia-bin on nixpkgs master and don't see any applied patches (to julia 1.10), except for autoPatchelfHook (which apparently only links libdl, libpthread, and libc). Do you have a reference to an up-to-date issue, discussion, or source code?

ChrisRackauckas commented 4 months ago

@oscardssmith do you even know what to do with this? Maybe it's a software FMA bug in some linux platforms? It's really hard to tell if it's actually solvable or a chip/OS thing.

Not sure if this is relevant, but since the binary didn't run on its own (I suppose missing dynamic libraries), I used nix-alien to make it work.

Red flag right there.

@ChrisRackauckas I have looked into the build description of julia-bin on nixpkgs master and don't see any applied patches (to julia 1.10), except for autoPatchelfHook (which apparently only links libdl, libpthread, and libc). Do you have a reference to an up-to-date issue, discussion, or source code?

That's part of the issue. It needs a patched LLVM, Julia builds with a patched LLVM and requires it. Nix does not include the required patches.

oscardssmith commented 4 months ago

@sjkelly What is the actual correct way to get Julia on Nix? I'm pretty sure this is just an issue with an incorrectly installed julia.

sjkelly commented 4 months ago

I cannot reproduce this on my local nixos, albeit with a far newer CPU:

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700H
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores)
Environment:
  LD_LIBRARY_PATH = /run/current-system/sw/share/nix-ld/lib
  JULIA_SSL_CA_ROOTS_PATH = /etc/ssl/certs/ca-bundle.crt

julia-bin on nix should be fine. It is not a distro build, but rather a download of our tarballs: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/compilers/julia/generic-bin.nix#L46-L63

I recommend using juliaup and add nix-ld to your config: https://github.com/Mic92/nix-ld

{
  programs.nix-ld.enable = true;
}

This should give you the most seamless experience, especially with support for artifacts.

I do suspect this is due to behavior on older CPUs though more so than a nix issue.