JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.63k stars 5.48k forks source link

Testing ProbNumDiffEq errors on 1.8 with Unreachable reached, signal (4): Illegal instruction #45704

Open KristofferC opened 2 years ago

KristofferC commented 2 years ago

(Running with assertions on).

https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/8b2e406_vs_742b9ab/ProbNumDiffEq.primary.log

Unreachable reached at 0x7fc58bdd4f00

signal (4): Illegal instruction
in expression starting at /home/kc/.julia/packages/ProbNumDiffEq/C4V7j/test/destats.jl:7
Dual at /home/kc/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:19
#ForwardColorJacCache#12 at /home/kc/.julia/packages/SparseDiffTools/HI65u/src/differentiation/compute_jacobian_ad.jl:41
Type##kw at /home/kc/.julia/packages/SparseDiffTools/HI65u/src/differentiation/compute_jacobian_ad.jl:19 [inlined]
build_jac_config at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/derivative_wrappers.jl:141 [inlined]
build_jac_config at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/derivative_wrappers.jl:123 [inlined]
alg_cache at /home/kc/.julia/packages/ProbNumDiffEq/C4V7j/src/caches.jl:195
#__init#563 at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:295
__init##kw at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:9 [inlined]
__init##kw at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:9 [inlined]
__init##kw at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:9 [inlined]
__init##kw at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:9 [inlined]
__init##kw at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:9 [inlined]
#__solve#562 at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:4
__solve##kw at /home/kc/.julia/packages/OrdinaryDiffEq/nrmdG/src/solve.jl:1 [inlined]
#solve_call#39 at /home/kc/.julia/packages/DiffEqBase/S7V8q/src/solve.jl:221 [inlined]
solve_call##kw at /home/kc/.julia/packages/DiffEqBase/S7V8q/src/solve.jl:207 [inlined]
#solve_up#41 at /home/kc/.julia/packages/DiffEqBase/S7V8q/src/solve.jl:248 [inlined]
solve_up##kw at /home/kc/.julia/packages/DiffEqBase/S7V8q/src/solve.jl:237 [inlined]
#solve#40 at /home/kc/.julia/packages/DiffEqBase/S7V8q/src/solve.jl:234 [inlined]
solve##kw at /home/kc/.julia/packages/DiffEqBase/S7V8q/src/solve.jl:226
...
JeffBezanson commented 2 years ago

Are we sure this only happens with assertions on? After all, it's not an assertion failure :smile:

KristofferC commented 2 years ago

Indeed, it happens even without assertions.

JeffBezanson commented 2 years ago

These are usually type inference or intersection bugs.

JeffBezanson commented 2 years ago

Running in a debug build I get instead:

Intrinsic name not mangled correctly for type arguments! Should be: llvm.powi.f64.i32
double (double, i32)* @llvm.powi.f64
in function julia__transdiff_ibm_element_104715
vtjnash commented 2 years ago

That is our mistake then, and fixed on master by #44580. We were using llvmcall, which does not have a stable API across LLVM versions. We should notice this less often because of https://github.com/JuliaLang/julia/pull/44697 now.

JeffBezanson commented 2 years ago

:+1: With that fixed, I get the illegal instruction, yay (?)

vtjnash commented 2 years ago
(rr) p jl_gdblookup($rip)
Dual at /home/vtjnash/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:19
(rr) p jl_(jl_gdblookuplinfo($rip))
(::Type{ForwardDiff.Dual{ForwardDiff.Tag{OrdinaryDiffEq.OrdinaryDiffEqTag, Float64}, Float64, 3}})(Float64, ForwardDiff.Partials{3, Float64}) from (::Type{ForwardDiff.Dual{T, V, N}})(V, ForwardDiff.Partials{N, V}) where {T, V, N}
(rr) p $rip                                                                                                                                                                                                                                                                $4 = (void (*)()) 0x7f769beb14d0                                                                                                                                                                                                                                           
(rr) watch *0x7f769beb14d0                                                                                                                                                                                                                                                 
Hardware watchpoint 1: *0x7f769beb14d0                                                                                                                                                                                                                                     
(rr) b JuliaOJIT::OptSelLayerT::emit
(rr) rc
(rr) until 502
(rr) p jl_dump_llvm_module(&M)
define void @julia_Dual_100210({ double, [1 x [3 x double]] }* noalias nocapture noundef nonnull sret({ double, [1 x [3 x double]] }) align 8 dereferenceable(32) %0, double %1, [1 x [3 x double]] addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(24) %2) #0 !dbg !4 {
top:
  %3 = alloca { double, [1 x [3 x double]] }, align 8
  %4 = call {}*** @julia.get_pgcstack()
  %5 = bitcast {}*** %4 to {}**
  %current_task = getelementptr inbounds {}*, {}** %5, i64 -13
  %6 = bitcast {}** %current_task to i64*
  %world_age = getelementptr inbounds i64, i64* %6, i64 14
  %7 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 0, !dbg !7
  store double %1, double* %7, align 8, !dbg !7, !tbaa !8
  %8 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 1, !dbg !7
  %9 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]] addrspace(11)* %2, i32 0, i32 0, !dbg !7
  %10 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]]* %8, i32 0, i32 0, !dbg !7
  %11 = bitcast [3 x double]* %10 to i8*, !dbg !7
  %12 = bitcast [3 x double] addrspace(11)* %9 to i8 addrspace(11)*, !dbg !7
  call void @llvm.memcpy.p0i8.p11i8.i64(i8* align 8 %11, i8 addrspace(11)* %12, i64 24, i1 false), !dbg !7, !tbaa !12
  call void @llvm.trap(), !dbg !7
  unreachable, !dbg !7

after_noret:                                      ; No predecessors!
  call void @llvm.trap(), !dbg !7
  unreachable, !dbg !7
}
julia> using ProbNumDiffEq
┌ Warning: Replacing module `OrdinaryDiffEq`
└ @ Base loading.jl:1196

julia> using OrdinaryDiffEq

julia> using ForwardDiff

julia> code_llvm(ForwardDiff.Dual{ForwardDiff.Tag{OrdinaryDiffEq.OrdinaryDiffEqTag, Float64}, Float64, 3}, (Float64, ForwardDiff.Partials{3, Float64}), optimize=false)
;  @ /home/vtjnash/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:17 within `Dual`
define void @julia_Dual_1654({ double, [1 x [3 x double]] }* noalias nocapture noundef nonnull sret({ double, [1 x [3 x double]] }) align 8 dereferenceable(32) %0, double %1, [1 x [3 x double]]* nocapture noundef nonnull readonly align 8 dereferenceable(24) %2) #0 {
top:
  %3 = alloca { double, [1 x [3 x double]] }, align 8
  %4 = call {}*** @julia.get_pgcstack()
  %5 = bitcast {}*** %4 to {}**
  %current_task = getelementptr inbounds {}*, {}** %5, i64 -13
  %6 = bitcast {}** %current_task to i64*
  %world_age = getelementptr inbounds i64, i64* %6, i64 14
;  @ /home/vtjnash/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:19 within `Dual`
  %7 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 0
  store double %1, double* %7, align 8
  %8 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 1
  %9 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]]* %2, i32 0, i32 0
  %10 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]]* %8, i32 0, i32 0
  %11 = bitcast [3 x double]* %10 to i8*
  %12 = bitcast [3 x double]* %9 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %11, i8* %12, i64 24, i1 false)
  call void @llvm.trap()
  unreachable

after_noret:                                      ; No predecessors!
  call void @llvm.trap()
  unreachable
}

The @warn statement there is probably causing corruption of our datatype references. Since #43990 maybe it should have been changed to an error?

KristofferC commented 2 years ago
module ProbNumDiffEq

using OrdinaryDiffEq
using TaylorIntegration

end

is enough to repro.

...
require(TaylorIntegration [92b13dbe-c966-51a2-8445-caca9f8a7d42], RecursiveArrayTools) -> RecursiveArrayTools [731186ca-8d62-57ce-b412-fbd966d074cd]
require(TaylorIntegration [92b13dbe-c966-51a2-8445-caca9f8a7d42], DiffEqBase) -> DiffEqBase [2b5f629d-d688-5b77-993f-72d75c75574e]
require(TaylorIntegration [92b13dbe-c966-51a2-8445-caca9f8a7d42], OrdinaryDiffEq) -> OrdinaryDiffEq [1dea7af3-3e70-54e6-95c3-0bf5283fa5ed]
ERROR: Replacing module `OrdinaryDiffEq`
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] macro expansion
    @ ./loading.jl:1120 [inlined]
  [3] macro expansion
    @ ./lock.jl:223 [inlined]
  [4] register_root_module(m::Module)

This load comes from a @require block:

https://github.com/PerezHz/TaylorIntegration.jl/blob/0223c5c02d82dda293f765d15e9aeeca1aa88139/src/TaylorIntegration.jl#L20-L22

https://github.com/PerezHz/TaylorIntegration.jl/blob/dc6cc36f5f7fff7822dca16505bbdc8eca097679/src/common.jl#L1

So the loading seems to go something like:

load OrdinaryDiffEq
    load DiffEqBase
load TaylorIntegration
    include "common".jl from `@require DiffEqBase`
        load OrdinaryDiffEq
            error due to Replacing module `OrdinaryDiffEq`

@vtjnash, does the above make any bells ring of what could be the problem?

vtjnash commented 2 years ago

Yeah, I see. TaylorIntegration demanded that we create a cycle in the dependency graph, and those are potentially impossible to handle correctly. In particular, if TaylorIntegration is loaded, then immediately after DiffEqBase is loaded, TaylorIntegration expects to demand that OrdinaryDiffEq is already loaded too. But if we were choosing to load DiffEqBase specifically to satisfy the dependency for OrdinaryDiffEq, that may be an impossible request.

vtjnash commented 2 years ago

A similarly related version of this is:

julia> using TaylorIntegration
julia> using OrdinaryDiffEq
<certain deadlock>

This might just generally be a danger of putting code in __init__ blocks?

vtjnash commented 2 years ago

It looks like the underlying issue causing the segfault is the ircode compressor assuming that modules are relocatable in the roots array, resulting in the ircode pointing to the wrong module after loading the later packages (due to https://github.com/JuliaLang/julia/pull/43990)