Open KristofferC opened 2 years ago
Are we sure this only happens with assertions on? After all, it's not an assertion failure :smile:
Indeed, it happens even without assertions.
These are usually type inference or intersection bugs.
Running in a debug build I get instead:
Intrinsic name not mangled correctly for type arguments! Should be: llvm.powi.f64.i32
double (double, i32)* @llvm.powi.f64
in function julia__transdiff_ibm_element_104715
That is our mistake then, and fixed on master by #44580. We were using llvmcall, which does not have a stable API across LLVM versions. We should notice this less often because of https://github.com/JuliaLang/julia/pull/44697 now.
:+1: With that fixed, I get the illegal instruction, yay (?)
(rr) p jl_gdblookup($rip)
Dual at /home/vtjnash/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:19
(rr) p jl_(jl_gdblookuplinfo($rip))
(::Type{ForwardDiff.Dual{ForwardDiff.Tag{OrdinaryDiffEq.OrdinaryDiffEqTag, Float64}, Float64, 3}})(Float64, ForwardDiff.Partials{3, Float64}) from (::Type{ForwardDiff.Dual{T, V, N}})(V, ForwardDiff.Partials{N, V}) where {T, V, N}
(rr) p $rip $4 = (void (*)()) 0x7f769beb14d0
(rr) watch *0x7f769beb14d0
Hardware watchpoint 1: *0x7f769beb14d0
(rr) b JuliaOJIT::OptSelLayerT::emit
(rr) rc
(rr) until 502
(rr) p jl_dump_llvm_module(&M)
define void @julia_Dual_100210({ double, [1 x [3 x double]] }* noalias nocapture noundef nonnull sret({ double, [1 x [3 x double]] }) align 8 dereferenceable(32) %0, double %1, [1 x [3 x double]] addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(24) %2) #0 !dbg !4 {
top:
%3 = alloca { double, [1 x [3 x double]] }, align 8
%4 = call {}*** @julia.get_pgcstack()
%5 = bitcast {}*** %4 to {}**
%current_task = getelementptr inbounds {}*, {}** %5, i64 -13
%6 = bitcast {}** %current_task to i64*
%world_age = getelementptr inbounds i64, i64* %6, i64 14
%7 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 0, !dbg !7
store double %1, double* %7, align 8, !dbg !7, !tbaa !8
%8 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 1, !dbg !7
%9 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]] addrspace(11)* %2, i32 0, i32 0, !dbg !7
%10 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]]* %8, i32 0, i32 0, !dbg !7
%11 = bitcast [3 x double]* %10 to i8*, !dbg !7
%12 = bitcast [3 x double] addrspace(11)* %9 to i8 addrspace(11)*, !dbg !7
call void @llvm.memcpy.p0i8.p11i8.i64(i8* align 8 %11, i8 addrspace(11)* %12, i64 24, i1 false), !dbg !7, !tbaa !12
call void @llvm.trap(), !dbg !7
unreachable, !dbg !7
after_noret: ; No predecessors!
call void @llvm.trap(), !dbg !7
unreachable, !dbg !7
}
julia> using ProbNumDiffEq
┌ Warning: Replacing module `OrdinaryDiffEq`
└ @ Base loading.jl:1196
julia> using OrdinaryDiffEq
julia> using ForwardDiff
julia> code_llvm(ForwardDiff.Dual{ForwardDiff.Tag{OrdinaryDiffEq.OrdinaryDiffEqTag, Float64}, Float64, 3}, (Float64, ForwardDiff.Partials{3, Float64}), optimize=false)
; @ /home/vtjnash/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:17 within `Dual`
define void @julia_Dual_1654({ double, [1 x [3 x double]] }* noalias nocapture noundef nonnull sret({ double, [1 x [3 x double]] }) align 8 dereferenceable(32) %0, double %1, [1 x [3 x double]]* nocapture noundef nonnull readonly align 8 dereferenceable(24) %2) #0 {
top:
%3 = alloca { double, [1 x [3 x double]] }, align 8
%4 = call {}*** @julia.get_pgcstack()
%5 = bitcast {}*** %4 to {}**
%current_task = getelementptr inbounds {}*, {}** %5, i64 -13
%6 = bitcast {}** %current_task to i64*
%world_age = getelementptr inbounds i64, i64* %6, i64 14
; @ /home/vtjnash/.julia/packages/ForwardDiff/wAaVJ/src/dual.jl:19 within `Dual`
%7 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 0
store double %1, double* %7, align 8
%8 = getelementptr inbounds { double, [1 x [3 x double]] }, { double, [1 x [3 x double]] }* %3, i32 0, i32 1
%9 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]]* %2, i32 0, i32 0
%10 = getelementptr inbounds [1 x [3 x double]], [1 x [3 x double]]* %8, i32 0, i32 0
%11 = bitcast [3 x double]* %10 to i8*
%12 = bitcast [3 x double]* %9 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %11, i8* %12, i64 24, i1 false)
call void @llvm.trap()
unreachable
after_noret: ; No predecessors!
call void @llvm.trap()
unreachable
}
The @warn
statement there is probably causing corruption of our datatype references. Since #43990 maybe it should have been changed to an error?
module ProbNumDiffEq
using OrdinaryDiffEq
using TaylorIntegration
end
is enough to repro.
...
require(TaylorIntegration [92b13dbe-c966-51a2-8445-caca9f8a7d42], RecursiveArrayTools) -> RecursiveArrayTools [731186ca-8d62-57ce-b412-fbd966d074cd]
require(TaylorIntegration [92b13dbe-c966-51a2-8445-caca9f8a7d42], DiffEqBase) -> DiffEqBase [2b5f629d-d688-5b77-993f-72d75c75574e]
require(TaylorIntegration [92b13dbe-c966-51a2-8445-caca9f8a7d42], OrdinaryDiffEq) -> OrdinaryDiffEq [1dea7af3-3e70-54e6-95c3-0bf5283fa5ed]
ERROR: Replacing module `OrdinaryDiffEq`
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] macro expansion
@ ./loading.jl:1120 [inlined]
[3] macro expansion
@ ./lock.jl:223 [inlined]
[4] register_root_module(m::Module)
This load comes from a @require
block:
So the loading seems to go something like:
load OrdinaryDiffEq
load DiffEqBase
load TaylorIntegration
include "common".jl from `@require DiffEqBase`
load OrdinaryDiffEq
error due to Replacing module `OrdinaryDiffEq`
@vtjnash, does the above make any bells ring of what could be the problem?
Yeah, I see. TaylorIntegration demanded that we create a cycle in the dependency graph, and those are potentially impossible to handle correctly. In particular, if TaylorIntegration is loaded, then immediately after DiffEqBase
is loaded, TaylorIntegration
expects to demand that OrdinaryDiffEq
is already loaded too. But if we were choosing to load DiffEqBase
specifically to satisfy the dependency for OrdinaryDiffEq
, that may be an impossible request.
A similarly related version of this is:
julia> using TaylorIntegration
julia> using OrdinaryDiffEq
<certain deadlock>
This might just generally be a danger of putting code in __init__
blocks?
It looks like the underlying issue causing the segfault is the ircode compressor assuming that modules are relocatable in the roots array, resulting in the ircode pointing to the wrong module after loading the later packages (due to https://github.com/JuliaLang/julia/pull/43990)
(Running with assertions on).
https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/8b2e406_vs_742b9ab/ProbNumDiffEq.primary.log