EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
455 stars 64 forks source link

Symbol lookup failure #74

Closed oschulz closed 3 years ago

oschulz commented 3 years ago

(@wsmoses not sure whether to report this here or at Enzyme itself)

using Enzyme

a = 2
c = 3

f(a, c) = a * √c
∂f_∂a = autodiff(f, Active(a), c)

results in

name = "__memmove_ssse3_back"
ERROR: Enzyme: Symbol lookup failed. Aborting!
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] resolver(name::Cstring, ctx::Ptr{Nothing})
    @ Enzyme.Compiler /user/.julia/dev/Enzyme/src/compiler.jl:826

Doesn't happen when using c instead of √c.

Tested using Enzyme.jl v0.6.0 with official Julia Linux x86_64 binaries, system info:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
wsmoses commented 3 years ago

This appears to be an ORC JIT issue (LLVM, but not Enzyme). Trying to reproduce it on my system yields a similar, though not identical issue.

@vchuravy how is that ORC update going?

julia> ∂f_∂a = autodiff(f, Active(a), c)
name = "_os_semaphore_wait.cold.1"
ERROR: Enzyme: Symbol lookup failed. Aborting!
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] resolver(name::Cstring, ctx::Ptr{Nothing})
   @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:1008
 [3] macro expansion
   @ ~/.julia/packages/LLVM/1GCWB/src/util.jl:85 [inlined]
 [4] LLVMOrcGetSymbolAddressIn(JITStack::LLVM.OrcJIT, RetAddr::Base.RefValue{UInt64}, H::LLVM.OrcModule, SymbolName::String)
   @ LLVM.API ~/.julia/packages/LLVM/1GCWB/lib/libLLVM_h.jl:791
 [5] addressin
   @ ~/.julia/packages/LLVM/1GCWB/src/orc.jl:147 [inlined]
 [6] _link(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(f), Tuple{Float64, Int64}}}, ::Tuple{LLVM.Module, String, Nothing})
   @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:1026
 [7] callback(orc_ref::Ptr{LLVM.API.LLVMOrcOpaqueJITStack}, callback_ctx::Ptr{Nothing})
   @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:1110
vchuravy commented 3 years ago

@lhames is this why OrcV2 has "AddLLVMIrModuleWithRT"? Is there a canonical fix for OrcV1?

lhames commented 3 years ago

AddLLVMIrModuleWithRT enables fine-grained code removal (RT is short for ResourceTracker). I don't think that's relevant to this lookup failure.

The important question is: Where is __memmove_ssse3_back defined, and why is the resolver failing to find it?

I think it should be defined by glibc. If __memmove_ssse3_back shows up in a dlsym call in the process then you need to use dlsym somewhere in your resolver to find it (though you must have some method for looking up process symbols already, or you'd have failed on more trivial programs, so the follow-up question is "why didn't the usual process symbol lookup approach work?")

If __memmove_ssse3_back does not show up in a dlsym call then you need to figure out some way to get a definition of it into your process.

vchuravy commented 3 years ago

(RT is short for ResourceTracker).

Head against desk.

This is our current resolver https://github.com/wsmoses/Enzyme.jl/blob/da338854728fb0f2836b2d0fd919d66fee5a911d/src/compiler.jl#L981-L1012

vchuravy commented 3 years ago

So

open("mod.ll", "w") do io; Enzyme.Compiler.enzyme_code_llvm(io, f, Tuple{Active{Float64}, Const{Int}}, dump_module=true); end

yields:

; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

;  @ strings/substring.jl:208 within `string'
define internal fastcc nonnull {}* @julia_string_2770({}* nonnull %0, {}* nonnull %1) unnamed_addr {
top:
  %gcframe30 = alloca [3 x {}*], align 16
  %gcframe30.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe30, i64 0, i64 0
  %2 = bitcast [3 x {}*]* %gcframe30 to i8*
  call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %2, i8 0, i32 24, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #10
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i64 -32768
;  @ strings/substring.jl:210 within `string'
; ┌ @ tuple.jl:66 within `iterate' @ tuple.jl:66
; │┌ @ tuple.jl:29 within `getindex'
    %3 = bitcast [3 x {}*]* %gcframe30 to i64*
    store i64 4, i64* %3, align 16
    %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe30, i64 0, i64 1
    %5 = bitcast i8* %ptls_i8 to i64*
    %6 = load i64, i64* %5, align 8
    %7 = bitcast {}** %4 to i64*
    store i64 %6, i64* %7, align 8
    %8 = bitcast i8* %ptls_i8 to {}***
    store {}** %gcframe30.sub, {}*** %8, align 8
    %9 = bitcast {}* %0 to i64*
; └└
;  @ strings/substring.jl:214 within `string'
; ┌ @ Base.jl:147 within `sizeof'
   %10 = load i64, i64* %9, align 8
; └
; ┌ @ tuple.jl:66 within `iterate'
; │┌ @ tuple.jl:29 within `getindex'
    %11 = bitcast {}* %1 to i64*
; └└
; ┌ @ Base.jl:147 within `sizeof'
   %12 = load i64, i64* %11, align 8
; └
; ┌ @ int.jl:87 within `+'
   %13 = add nuw i64 %12, %10
; └
;  @ strings/substring.jl:217 within `string'
; ┌ @ strings/string.jl:74 within `_string_n'
; │┌ @ essentials.jl:396 within `cconvert'
; ││┌ @ number.jl:7 within `convert'
; │││┌ @ boot.jl:757 within `UInt64'
; ││││┌ @ boot.jl:727 within `toUInt64'
; │││││┌ @ boot.jl:616 within `check_top_bit'
; ││││││┌ @ boot.jl:606 within `is_top_bit_set'
         %14 = icmp sgt i64 %13, -1
; ││││││└
        br i1 %14, label %L39, label %L31

L31:                                              ; preds = %top
        %15 = call fastcc nonnull {}* @julia_throw_inexacterror_2774() #11
        unreachable

L39:                                              ; preds = %top
; │└└└└└
   %16 = call nonnull {}* @jl_alloc_string(i64 %13)
; └
;  @ strings/substring.jl:220 within `string'
; ┌ @ strings/substring.jl:204 within `__unsafe_string!'
; │┌ @ strings/string.jl:95 within `pointer'
; ││┌ @ pointer.jl:59 within `unsafe_convert'
     %17 = bitcast {}* %0 to {}**
     %18 = getelementptr inbounds {}*, {}** %17, i64 1
     %19 = ptrtoint {}** %18 to i64
; │└└
; │┌ @ strings/string.jl:96 within `pointer' @ strings/string.jl:95
; ││┌ @ pointer.jl:59 within `unsafe_convert'
     %20 = bitcast {}* %16 to {}**
     %21 = getelementptr inbounds {}*, {}** %20, i64 1
; ││└
; ││ @ strings/string.jl:96 within `pointer'
; ││┌ @ pointer.jl:159 within `+'
     %22 = bitcast {}** %21 to i8*
; ││└
; ││┌ @ pointer.jl:160 within `-'
     %23 = ptrtoint {}** %21 to i64
     %24 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe30, i64 0, i64 2
     store {}* %16, {}** %24, align 16
; │└└
; │┌ @ array.jl:223 within `unsafe_copyto!'
    %25 = call i64 @__memcpy_avx_unaligned_erms(i64 %23, i64 %19, i64 %10)
; │└
; │┌ @ strings/string.jl:95 within `pointer'
; ││┌ @ pointer.jl:59 within `unsafe_convert'
     %26 = bitcast {}* %1 to {}**
     %27 = getelementptr inbounds {}*, {}** %26, i64 1
     %28 = ptrtoint {}** %27 to i64
; │└└
; │┌ @ strings/string.jl:96 within `pointer'
; ││┌ @ pointer.jl:160 within `-'
     %29 = getelementptr i8, i8* %22, i64 %10
     %30 = ptrtoint i8* %29 to i64
; │└└
; │┌ @ array.jl:223 within `unsafe_copyto!'
    %31 = call i64 @__memcpy_avx_unaligned_erms(i64 %30, i64 %28, i64 %12)
    %32 = load i64, i64* %7, align 8
    store i64 %32, i64* %5, align 8
; └└
;  @ strings/substring.jl:222 within `string'
  ret {}* undef
}

; Function Attrs: cold noreturn nounwind
declare void @llvm.trap() #0

declare token @llvm.julia.gc_preserve_begin(...)

; Function Attrs: nounwind readnone
declare nonnull {}* @julia.pointer_from_objref({}*) local_unnamed_addr #1

declare void @llvm.julia.gc_preserve_end(token)

;  @ math.jl:32 within `throw_complex_domainerror'
; Function Attrs: noinline noreturn
define internal fastcc noalias nonnull align 536870912 dereferenceable(4294967295) {}* @julia_throw_complex_domainerror_2767() unnamed_addr #2 {
top:
  %0 = alloca [3 x {}*], align 8
  %gcframe2 = alloca [4 x {}*], align 16
  %gcframe2.sub = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe2, i64 0, i64 0
  %.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %0, i64 0, i64 0
  %1 = bitcast [4 x {}*]* %gcframe2 to i8*
  call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(32) %1, i8 0, i32 32, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #10
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i64 -32768
;  @ math.jl:33 within `throw_complex_domainerror'
; ┌ @ strings/io.jl:174 within `string'
   %2 = bitcast [4 x {}*]* %gcframe2 to i64*
   store i64 8, i64* %2, align 16
   %3 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe2, i64 0, i64 1
   %4 = bitcast i8* %ptls_i8 to i64*
   %5 = load i64, i64* %4, align 8
   %6 = bitcast {}** %3 to i64*
   store i64 %5, i64* %6, align 8
   %7 = bitcast i8* %ptls_i8 to {}***
   store {}** %gcframe2.sub, {}*** %7, align 8
   store {}* inttoptr (i64 140687711185360 to {}*), {}** %.sub, align 8
   %8 = getelementptr inbounds [3 x {}*], [3 x {}*]* %0, i64 0, i64 1
   store {}* inttoptr (i64 140687841654800 to {}*), {}** %8, align 8
   %9 = call nonnull {}* @jl_invoke({}* inttoptr (i64 140687856164832 to {}*), {}** nonnull %.sub, i32 2, {}* inttoptr (i64 140687809090048 to {}*))
   %10 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe2, i64 0, i64 3
   store {}* %9, {}** %10, align 8
   store {}* inttoptr (i64 140687841654880 to {}*), {}** %.sub, align 8
   store {}* inttoptr (i64 140687711185360 to {}*), {}** %8, align 8
   %11 = getelementptr inbounds [3 x {}*], [3 x {}*]* %0, i64 0, i64 2
   store {}* inttoptr (i64 140687841654928 to {}*), {}** %11, align 8
   %12 = call nonnull {}* @jl_invoke({}* inttoptr (i64 140687856164832 to {}*), {}** nonnull %.sub, i32 3, {}* inttoptr (i64 140687806010960 to {}*))
   %13 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe2, i64 0, i64 2
   store {}* %12, {}** %13, align 16
; └
  %14 = call fastcc nonnull {}* @julia_string_2770({}* nonnull %9, {}* nonnull %12)
  call void @llvm.trap() #12
  unreachable
}

declare nonnull {}* @jl_invoke({}*, {}** nocapture readonly, i32, {}*) local_unnamed_addr

; Function Attrs: allocsize(1)
declare noalias nonnull {}* @julia.gc_alloc_obj(i8*, i64, {}*) local_unnamed_addr #3

; Function Attrs: nounwind readnone speculatable willreturn
declare double @llvm.sqrt.f64(double) #4

;  @ boot.jl:602 within `throw_inexacterror'
; Function Attrs: noinline noreturn nosync readnone
define internal fastcc noalias nonnull align 536870912 dereferenceable(4294967295) {}* @julia_throw_inexacterror_2774() unnamed_addr #5 {
top:
; ┌ @ /home/vchuravy/src/GPUCompiler/src/runtime.jl:214 within `box_int64'
; │┌ @ /home/vchuravy/src/GPUCompiler/src/runtime.jl:174 within `box'
; ││┌ @ /home/vchuravy/src/GPUCompiler/src/runtime.jl:188 within `macro expansion'
     unreachable
; └└└
}

declare {}* @jl_alloc_string(i64) local_unnamed_addr

declare i64 @__memcpy_avx_unaligned_erms(i64, i64, i64) local_unnamed_addr

;  @ REPL[12]:1 within `f'
define double @julia_f_2764(double %0, i64 signext %1) local_unnamed_addr {
entry:
; ┌ @ math.jl:608 within `sqrt' @ math.jl:582
; │┌ @ float.jl:371 within `<'
    %2 = icmp sgt i64 %1, -1
; │└
   br i1 %2, label %julia_f_2764.inner.exit, label %L4.i

L4.i:                                             ; preds = %entry
   %3 = call fastcc nonnull {}* @julia_throw_complex_domainerror_2767() #12
   unreachable

julia_f_2764.inner.exit:                          ; preds = %entry
; │ @ math.jl:608 within `sqrt'
; │┌ @ float.jl:206 within `float'
; ││┌ @ float.jl:191 within `AbstractFloat'
; │││┌ @ float.jl:94 within `Float64'
      %4 = sitofp i64 %1 to double
; │└└└
; │ @ math.jl:608 within `sqrt' @ math.jl:583
   %5 = call double @llvm.sqrt.f64(double %4)
; └
; ┌ @ float.jl:332 within `*'
   %6 = fmul double %5, %0
   ret double %6
; └
}

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #6

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #6

;  @ REPL[12]:1 within `f'
define double @preprocess_julia_f_2764(double %0, i64 signext %1) local_unnamed_addr {
entry:
; ┌ @ math.jl:608 within `sqrt' @ math.jl:582
; │┌ @ float.jl:371 within `<'
    %2 = icmp sgt i64 %1, -1
; │└
   br i1 %2, label %julia_f_2764.inner.exit, label %L4.i

L4.i:                                             ; preds = %entry
   %3 = call fastcc nonnull {}* @julia_throw_complex_domainerror_2767() #12
   unreachable

julia_f_2764.inner.exit:                          ; preds = %entry
; │ @ math.jl:608 within `sqrt'
; │┌ @ float.jl:206 within `float'
; ││┌ @ float.jl:191 within `AbstractFloat'
; │││┌ @ float.jl:94 within `Float64'
      %4 = sitofp i64 %1 to double
; │└└└
; │ @ math.jl:608 within `sqrt' @ math.jl:583
   %5 = call double @llvm.sqrt.f64(double %4)
; └
; ┌ @ float.jl:332 within `*'
   %6 = fmul double %5, %0
   ret double %6
; └
}

;  @ REPL[12]:1 within `f'
; Function Attrs: alwaysinline
define dso_local { double } @diffejulia_f_2764(double %0, i64 signext %1, double %differeturn) local_unnamed_addr #7 {
entry:
; ┌ @ math.jl:608 within `sqrt' @ math.jl:582
; │┌ @ float.jl:371 within `<'
    %2 = icmp sgt i64 %1, -1
; │└
   br i1 %2, label %julia_f_2764.inner.exit, label %L4.i

L4.i:                                             ; preds = %entry
   %3 = call fastcc nonnull {}* @julia_throw_complex_domainerror_2767() #12
   unreachable

julia_f_2764.inner.exit:                          ; preds = %entry
; │ @ math.jl:608 within `sqrt'
; │┌ @ float.jl:206 within `float'
; ││┌ @ float.jl:191 within `AbstractFloat'
; │││┌ @ float.jl:94 within `Float64'
      %4 = sitofp i64 %1 to double
; │└└└
; │ @ math.jl:608 within `sqrt' @ math.jl:583
   %5 = call double @llvm.sqrt.f64(double %4)
   %m1diffe = fmul fast double %5, %differeturn
   %6 = insertvalue { double } undef, double %m1diffe, 0
   ret { double } %6
; └
}

; Function Attrs: inaccessiblemem_or_argmemonly
declare void @jl_gc_queue_root({}*) #8

; Function Attrs: allocsize(1)
declare noalias nonnull {}* @jl_gc_pool_alloc(i8*, i32, i32) #3

; Function Attrs: allocsize(1)
declare noalias nonnull {}* @jl_gc_big_alloc(i8*, i64) #3

declare noalias nonnull {}** @julia.new_gc_frame(i32)

declare void @julia.push_gc_frame({}**, i32)

declare {}** @julia.get_gc_frame_slot({}**, i32)

declare void @julia.pop_gc_frame({}**)

; Function Attrs: argmemonly nounwind willreturn writeonly
declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #9

attributes #0 = { cold noreturn nounwind }
attributes #1 = { nounwind readnone }
attributes #2 = { noinline noreturn }
attributes #3 = { allocsize(1) }
attributes #4 = { nounwind readnone speculatable willreturn }
attributes #5 = { noinline noreturn nosync readnone }
attributes #6 = { argmemonly nounwind willreturn }
attributes #7 = { alwaysinline }
attributes #8 = { inaccessiblemem_or_argmemonly }
attributes #9 = { argmemonly nounwind willreturn writeonly }
attributes #10 = { nounwind }
attributes #11 = { noreturn nosync readnone }
attributes #12 = { noreturn }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 1, !"Debug Info Version", i32 3}

So it's coming from the throw branch....

vchuravy commented 3 years ago

So it's the lowering of https://github.com/JuliaLang/julia/blob/94b9d66b10e8e3ebdb268e4be5f7e1f43079ad4e/base/array.jl#L241

but Julia base says:

julia> code_llvm(unsafe_copyto!, (Ptr{Cchar}, Ptr{Cchar}, Csize_t), optimize=false)
;  @ array.jl:220 within `unsafe_copyto!'
define i64 @"julia_unsafe_copyto!_187"(i64 zeroext %0, i64 zeroext %1, i64 zeroext %2) {
top:
  %3 = call {}*** @julia.ptls_states()
  %4 = bitcast {}*** %3 to {}**
  %5 = getelementptr inbounds {}*, {}** %4, i64 4
  %6 = bitcast {}** %5 to i64**
  %7 = load i64*, i64** %6, align 8
;  @ array.jl:223 within `unsafe_copyto!'
; ┌ @ int.jl:923 within `*' @ int.jl:88
   %8 = mul i64 %2, 1
; └
  %9 = call i64 inttoptr (i64 140483890193680 to i64 (i64, i64, i64)*)(i64 %0, i64 %1, i64 %8)
;  @ array.jl:225 within `unsafe_copyto!'
  ret i64 %0
}
wsmoses commented 3 years ago

Well at least this time we can't blame cassette.

vchuravy commented 3 years ago
julia> ∂f_∂a = autodiff(f, Active(a), c)
┌ Info: Replacing
│   ptr = Ptr{Nothing} @0x00007fc3684896c9
└   fn = "jl_alloc_string"
┌ Info: Replacing
│   ptr = Ptr{Nothing} @0x00007fc3687fe510
└   fn = "__memcpy_avx_unaligned_erms"
name = "__memcpy_avx_unaligned_erms"
ERROR: Enzyme: Symbol lookup failed. Aborting!