`unstable_check` gets passed value that's far from solution #2179

laikq commented 5 months ago

Describe the bug 🐞

Under very specific conditions, unstable_check gets passed values that are far from the actual solution, leading to an unwarranted abortion of the computation.

Expected behavior

unstable_check should only get passed values that later also appear in the solution of the differential equation.

Minimal Reproducible Example 👇

using OrdinaryDiffEq

struct ParabParams

function parab_rule(u, p, t)
    x = u
    α = p.α
    ε = p.ε
    x^2 - 2x*√α + ε*cos(π*t)

function parab_unstable_check(dt, u, p, t)
    α = p.α
    ε = p.ε
    # this condition is a crude approximation for “this system has tipped irrecoverably”
    cond = u > 3 + 3α + ε
    if cond
        @debug "Encountered condition @ " u dt t
    # ↓ replace by `cond` and the computation will abort prematurely

function doit(params)
    prob = ODEProblem(parab_rule, 0., (0., 500), params)
    sol = solve(prob, Tsit5();
        saveat=0.02, reltol=1e-4, abstol=1e-7, unstable_check=parab_unstable_check)
    @info "Maximum of solution is $(maximum(sol.u))"

doit(ParabParams(1, 3.374821173104435))  # this shows the erroneous behavior
# Output:
# Encountered condition @
# u 16.526948027196813
# dt 0.30287695167771833
# t 147.17216737552005
# Maximum of solution is 1.2421651640163502
doit(ParabParams(1, 3.37482118))  # this behaves as expected
# Output:
# Maximum of solution is 1.2421642587809048

Environment (please complete the following information):

Status `/tmp/jl_OyS6gD/Project.toml`
  [1dea7af3] OrdinaryDiffEq v6.74.1
  [44cfe95a] Pkg v1.10.0
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, ivybridge)
Threads: 4 default, 0 interactive, 2 GC (on 4 virtual cores)
  LD_LIBRARY_PATH = /etc/sane-libs

Julia binary taken from Nixpkgs repository.

Additional context

Unfortunately I don't really have a clue as to why this happens. But I know that mathematically, it makes no sense that x(t) > 16 for a small time for these parameters. For ε ≤ 4.5, the solution is stable and should not cause parab_unstable_check to fire.

Analytical background:

oscardssmith commented 5 months ago

This doesn't reproduce for me. I think the fact that you are on an ivy bridge cpu might have something to do with it though (ivy bridge is the last of the chips without fused multiply add instructions).

Edit: I'm also not able to reproduce this when running with julia -C sandybridge so I don't think the CPU is the issue. Now that I see "Julia binary taken from Nixpkgs repository." my guess is that Nix is shipping a broken version of Julia. Does this reproduce if you install Julia from the official downloads or Juliaup?

ChrisRackauckas commented 5 months ago

Nix is known to have a broken build because it uses the wrong llvm but they do a stupid patch to make it say it's official when we have tried to remove it for years. Please use a real binary from and it should be good.

laikq commented 5 months ago

@oscardssmith Thanks for trying to reproduce. I have downloaded the Julia 1.10.3 binary from the official downloads (but without Juliaup), and the error unfortunately still persists. Not sure if this is relevant, but since the binary didn't run on its own (I suppose missing dynamic libraries), I used nix-alien to make it work.

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, ivybridge)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
  LD_LIBRARY_PATH = /run/opengl-driver/lib:/run/opengl-driver-32/lib:/etc/sane-libs

@ChrisRackauckas I have looked into the build description of julia-bin on nixpkgs master and don't see any applied patches (to julia 1.10), except for autoPatchelfHook (which apparently only links libdl, libpthread, and libc). Do you have a reference to an up-to-date issue, discussion, or source code?

ChrisRackauckas commented 4 months ago

@oscardssmith do you even know what to do with this? Maybe it's a software FMA bug in some linux platforms? It's really hard to tell if it's actually solvable or a chip/OS thing.

Not sure if this is relevant, but since the binary didn't run on its own (I suppose missing dynamic libraries), I used nix-alien to make it work.

Red flag right there.

@ChrisRackauckas I have looked into the build description of julia-bin on nixpkgs master and don't see any applied patches (to julia 1.10), except for autoPatchelfHook (which apparently only links libdl, libpthread, and libc). Do you have a reference to an up-to-date issue, discussion, or source code?

That's part of the issue. It needs a patched LLVM, Julia builds with a patched LLVM and requires it. Nix does not include the required patches.

oscardssmith commented 4 months ago

@sjkelly What is the actual correct way to get Julia on Nix? I'm pretty sure this is just an issue with an incorrectly installed julia.

sjkelly commented 4 months ago

I cannot reproduce this on my local nixos, albeit with a far newer CPU:

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700H
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores)
  LD_LIBRARY_PATH = /run/current-system/sw/share/nix-ld/lib
  JULIA_SSL_CA_ROOTS_PATH = /etc/ssl/certs/ca-bundle.crt

julia-bin on nix should be fine. It is not a distro build, but rather a download of our tarballs:

I recommend using juliaup and add nix-ld to your config:

  programs.nix-ld.enable = true;

This should give you the most seamless experience, especially with support for artifacts.

I do suspect this is due to behavior on older CPUs though more so than a nix issue.