EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
439 stars 62 forks source link

Segfault when used with `Optim.jl` #298

Closed Moelf closed 2 years ago

Moelf commented 2 years ago

MWE:

julia> using Optim, Enzyme, LinearAlgebra

julia> function f(x)
           y1 = zeros(eltype(x), 3)
           y2 = ones(eltype(x), 3)
           y1 .+= (3 - sin(x[1]))^2
           y2 .+= (x[2] - 3)^4
           dot(y1, y2)
       end
f (generic function with 1 method)

julia> optimize(f, ones(2), NelderMead()) |> Optim.minimizer
2-element Vector{Float64}:
 1.5707884242800443
 2.99892521883371

julia> optimize(f, ones(2), LBFGS()) |> Optim.minimizer
2-element Vector{Float64}:
 1.5707963268056853
 3.000075121117243

julia> optimize(f, ones(2), LBFGS(); autodiff=:forward) |> Optim.minimizer
2-element Vector{Float64}:
 1.5707963270758434
 3.0000354995684293

julia> g! = (dx, αs) -> autodiff(f, Duplicated(αs, dx))
#1 (generic function with 1 method)

julia> optimize(f, g!, ones(2), LBFGS(); inplace=true) |> Optim.minimizer
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-unknown-linux-gnu' whereas 'text' is 'x86_64-pc-linux-gnu'

bcloader: bcloader:62:1: error: expected top-level entity
source_filename = "cblas/dcopy.c"
^
julia: /workspace/srcdir/Enzyme/enzyme/BCLoad/BCLoader.cpp:71: bool provideDefinitions(llvm::Module&): Assertion `BC' failed.

and non-inplace version sees to give wrong result:

julia> g = let dx = zeros(2)
           αs -> (autodiff(f, Duplicated(αs, dx)); dx)
       end
#5 (generic function with 1 method)

julia> optimize(f, g, ones(2), LBFGS(); inplace=false) |> Optim.minimizer
┌ Warning: Failed to achieve finite new evaluation point, using alpha=0
└ @ LineSearches ~/.julia/packages/LineSearches/Ki4c5/src/hagerzhang.jl:148
2-element Vector{Float64}:
 1.1493167686587399e8
 4.689119509344906
wsmoses commented 2 years ago

What version/commit of Enzyme / Enzyme_jll are you on? You look to be using a combo that has a mismatch?

Moelf commented 2 years ago

I did ]add Enzyme#main

  [7da242da] Enzyme v0.9.4
  [7cc45869] Enzyme_jll v0.0.30+1
wsmoses commented 2 years ago

Yeah #main requires a custom jll right now and will not work on the previous jll. Can you use the latest release?

Moelf commented 2 years ago

ok it doesn't crash anymore but still seems to be giving wrong result:

julia> g! = (dx, αs) -> autodiff(f, Duplicated(αs, dx))
#1 (generic function with 1 method)

julia> optimize(f, g!, ones(2), LBFGS(); inplace=true) |> Optim.minimizer
2-element Vector{Float64}:
 1.0
 1.0

julia> g = let dx = zeros(2)
           αs -> (autodiff(f, Duplicated(αs, dx)); dx)
       end
#3 (generic function with 1 method)

julia> optimize(f, g, ones(2), LBFGS(); inplace=false) |> Optim.minimizer
┌ Warning: Failed to achieve finite new evaluation point, using alpha=0
└ @ LineSearches ~/.julia/packages/LineSearches/Ki4c5/src/hagerzhang.jl:148
2-element Vector{Float64}:
 1.1493167686587399e8
 4.689119509344906
wsmoses commented 2 years ago

Can you make a version of this without Optim and just has an input to autodiff, and an incorrect result?

wsmoses commented 2 years ago

Note that in reverse mode, the duplicated will += the derivative into the shadow -- if your shadow is not already zero'd.

Moelf commented 2 years ago

ok actually, my original problem is this:


julia> using LiteHF, Enzyme

julia> RR = @time build_pyhf(load_pyhfjson("/home/akako/.julia/dev/LiteHF/test/pyhfjson/sample.json"));
  6.642286 seconds (22.35 M allocations: 1006.076 MiB, 5.98% gc time, 99.87% compilation time)

julia> LL(x) = -RR.LogLikelihood(x)
LL (generic function with 1 method)

julia> g = let dx = zeros(2)
           αs -> (autodiff(LL, Duplicated(αs, dx)); dx)
       end

julia> RR.prior_inits
2-element Vector{Float64}:
 1.0
 0.0

julia> g(RR.prior_inits)
ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{LLVM.API.LLVMOpaqueType}}, ::Nothing)
Closest candidates are:
  unsafe_convert(::Type{Ptr{T}}, ::SharedArrays.SharedArray{T}) where T at /usr/share/julia/stdlib/v1.7/SharedArrays/src/SharedArrays.jl:361
  unsafe_convert(::Type{Ptr{T}}, ::SharedArrays.SharedArray) where T at /usr/share/julia/stdlib/v1.7/SharedArrays/src/SharedArrays.jl:362
  unsafe_convert(::Type{Ptr{T}}, ::Adjoint{<:Real, <:AbstractVecOrMat}) where T at /usr/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:197
  ...
Stacktrace:
  [1] EnzymeGradientUtilsSubTransferHelper(gutils::Ptr{Nothing}, mode::Enzyme.API.CDerivativeMode, secretty::Nothing, intrinsic::UInt32, dstAlign::Int64, srcAlign::Int64, offset::Int64, dstConstant::Bool, origdst::LLVM.LoadInst, srcConstant::Bool, origsrc::LLVM.LoadInst, length::LLVM.MulInst, isVolatile::LLVM.ConstantInt, MTI::LLVM.CallInst, allowForward::Bool, shadowsLookedUp::Bool)

LiteHF: https://github.com/JuliaHEP/LiteHF.jl

I didn't find a way to reduce it, so I'm just gonna post it here

vchuravy commented 2 years ago

@wsmoses I recall you looked at this briefly, before you started travelling?

vchuravy commented 2 years ago

@Moelf can you test #308? And it would be great to have a reproducer that doesn't require an external file.

build_pyhf(load_pyhfjson(joinpath(dirname(pathof(LiteHF)), "..", "test/pyhfjson/sample.json")));
ERROR: SystemError: opening file "/home/vchuravy/.julia/packages/LiteHF/Vk433/src/../test/pyhfjson/sample.json": No such file or directory

Also note. You probably want const RR = or pass in RR as an argument to the function LL to avoid the type instability

Moelf commented 2 years ago

I don't have the custom _jll I think?

sorry about the file, can your try using a different file like multi_channel.json?

vchuravy commented 2 years ago

You don't need the custom jll anymore :)

Moelf commented 2 years ago

well


ERROR: Unsatisfiable requirements detected for package Enzyme_jll [7cc45869]:
 Enzyme_jll [7cc45869] log:
 ├─possible versions are: 0.0.1-0.0.30 or uninstalled
 └─restricted to versions 0.0.31 by Enzyme [7da242da] — no versions left
   └─Enzyme [7da242da] log:
vchuravy commented 2 years ago

That means your registry is outdated... https://github.com/JuliaRegistries/General/commit/d37800ef9c014e970f8f9f2b6f7d58b4cb128ef4 that was registered two days ago.

Moelf commented 2 years ago

still getting the same error

julia> using LiteHF, Enzyme

julia> const RR = @time build_pyhf(load_pyhfjson("/home/akako/.julia/dev/LiteHF/test/pyhfjson/multi_channel.json"));
  3.421716 seconds (7.80 M allocations: 406.006 MiB, 7.53% gc time, 99.66% compilation time)

julia> LL(x) = -RR.LogLikelihood(x)
LL (generic function with 1 method)

julia> g = let dx = zeros(2)
           αs -> (autodiff(LL, Duplicated(αs, dx)); dx)
       end
#1 (generic function with 1 method)

julia> g(RR.inits)
ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{LLVM.API.LLVMOpaqueType}}, ::Nothing)
Closest candidates are:
  unsafe_convert(::Type{Ptr{T}}, ::StaticArrays.SizedArray) where T at ~/.julia/packages/StaticArrays/58yy1/src/SizedArray.jl:127
  unsafe_convert(::Type{Ptr{T}}, ::LinearAlgebra.Transpose{<:Any, <:AbstractVecOrMat}) where T at /usr/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:199
  unsafe_convert(::Type{Ptr{T}}, ::Base.RefValue{SA}) where {S, T, D, L, SA<:StaticArrays.SArray{S, T, D, L}} at ~/.julia/packages/StaticArrays/58yy1/src/SArray.jl:125
  ...
Stacktrace:
  [1] EnzymeGradientUtilsSubTransferHelper(gutils::Ptr{Nothing}, mode::Enzyme.API.CDerivativeMode, secretty::Nothing, intrinsic::UInt32, dstAlign::Int64, srcAlign::Int64, offset::Int64, dstConstant::Bool, origdst::LLVM.LoadInst, srcConstant::Bool, origsrc::LLVM.LoadInst, length::LLVM.MulInst, isVolatile::LLVM.ConstantInt, MTI::LLVM.CallInst, allowForward::Bool, shadowsLookedUp::Bool)
    @ Enzyme.API ~/.julia/dev/Enzyme/src/api.jl:206
vchuravy commented 2 years ago

What's the full backtrace?

I am now getting:

 %"'ip_phi5" = phi {} addrspace(10)* 
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/CacheUtility.cpp:76: virtual void CacheUtility::erase(llvm::Instruction*): Assertion `I->use_empty()' failed.

signal (6): Aborted
in expression starting at REPL[10]:1
__pthread_kill_implementation at /usr/bin/../lib/libc.so.6 (unknown line)
raise at /usr/bin/../lib/libc.so.6 (unknown line)
abort at /usr/bin/../lib/libc.so.6 (unknown line)
__assert_fail_base.cold at /usr/bin/../lib/libc.so.6 (unknown line)
__assert_fail at /usr/bin/../lib/libc.so.6 (unknown line)
erase at /workspace/srcdir/Enzyme/enzyme/Enzyme/CacheUtility.cpp:76
erase at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.h:1022
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:8338
delegateCallInst at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:302 [inlined]
visitCall at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209 [inlined]
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:112 [inlined]
CreateAugmentedPrimal at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:2017
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:11238
delegateCallInst at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:302 [inlined]
visitCall at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209 [inlined]
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:112 [inlined]
CreateAugmentedPrimal at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:2017
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:11238
delegateCallInst at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:302 [inlined]
visitCall at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209 [inlined]
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:112 [inlined]
CreateAugmentedPrimal at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:2017
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:11238
delegateCallInst at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:302 [inlined]
visitCall at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209 [inlined]
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/Instruction.def:209
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:112 [inlined]
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:3656
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:438
EnzymeCreatePrimalAndGradient at /home/vchuravy/src/Enzyme/src/api.jl:111
enzyme! at /home/vchuravy/src/Enzyme/src/compiler.jl:3162

Which is defined as progress :)

] add https://github.com/JuliaHEP/LiteHF.jl
] add Enzyme#main

using LiteHF, Enzyme
const RR = build_pyhf(load_pyhfjson(joinpath(dirname(pathof(LiteHF)), "..", "test/pyhfjson/multi_channel.json")));

julia> LL(x) = -RR.LogLikelihood(x)
LL (generic function with 1 method)

julia> g = let dx = zeros(2)
           αs -> (autodiff(LL, Duplicated(αs, dx)); dx)
       end

julia> g(RR.inits)
Moelf commented 2 years ago

that was on #308, now I basically get the same thing, except it overruns my terminal buffer.....

wsmoses commented 2 years ago

On latest main and jll, I get the following:

using Optim, Enzyme, LinearAlgebra

function f(x)
           y1 = zeros(eltype(x), 3)
           y2 = ones(eltype(x), 3)
           y1 .+= (3 - sin(x[1]))^2
           y2 .+= (x[2] - 3)^4
           dot(y1, y2)
       end

@show optimize(f, ones(2), NelderMead()) |> Optim.minimizer

@show optimize(f, ones(2), LBFGS()) |> Optim.minimizer

@show optimize(f, ones(2), LBFGS(); autodiff=:forward) |> Optim.minimizer

g! = (dx, αs) -> autodiff(f, Duplicated(αs, dx))

@show optimize(f, g!, ones(2), LBFGS(); inplace=true) |> Optim.minimizer

function g_fix!(dx, as)
    dx .= 0
    autodiff(f, Duplicated(as, dx))
end

@show optimize(f, g_fix!, ones(2), LBFGS(); inplace=true) |> Optim.minimizer

g = let dx = zeros(2)
     αs -> (autodiff(f, Duplicated(αs, dx)); dx)
end

@show optimize(f, g, ones(2), LBFGS(); inplace=false) |> Optim.minimizer

function g_fix(as)
    dx = zeros(2)
    autodiff(f, Duplicated(as, dx))
    dx
end

@show optimize(f, g_fix, ones(2), LBFGS(); inplace=false) |> Optim.minimizer
wmoses@beast:~/git/Enzyme.jl (rand) $ ./julia-1.7.2/bin/julia --project what.jl
optimize(f, ones(2), NelderMead()) |> Optim.minimizer = [1.5707884242800443, 2.99892521883371]
optimize(f, ones(2), LBFGS()) |> Optim.minimizer = [1.5707963268056853, 3.000075121117243]
optimize(f, ones(2), LBFGS(); autodiff = :forward) |> Optim.minimizer = [1.5707963270758434, 3.0000354995684293]
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-unknown-linux-gnu' whereas 'text' is 'x86_64-pc-linux-gnu'

warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-unknown-linux-gnu' whereas 'text' is 'x86_64-pc-linux-gnu'

┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/XyxTy/src/utils.jl:35
optimize(f, g!, ones(2), LBFGS(); inplace = true) |> Optim.minimizer = [1.0, 1.0]
optimize(f, g_fix!, ones(2), LBFGS(); inplace = true) |> Optim.minimizer = [1.5707963270758434, 3.000035499568429]
┌ Warning: Failed to achieve finite new evaluation point, using alpha=0
└ @ LineSearches ~/.julia/packages/LineSearches/Ki4c5/src/hagerzhang.jl:148
optimize(f, g, ones(2), LBFGS(); inplace = false) |> Optim.minimizer = [1.1493167686587399e8, 4.689119509344906]
optimize(f, g_fix, ones(2), LBFGS(); inplace = false) |> Optim.minimizer = [1.5707963270758434, 3.000035499568429]

Namely the g_fix versions I wrote appear to succeed. I'm not sure why these alternate versions fail, perhaps Optim runs multiple things in parallel and as a result there's a race?

In any case, this appears to be an Optim usage issue?

Moelf commented 2 years ago

so what could cause the inplace to fail? I don't understand what the warning msg is saying exaactly, what is bcloader?

wsmoses commented 2 years ago

@Moelf try again on main. For some reason g_fix! now appears to work with the jll bump so there might've been a weird aliasing bug that was fixed.

Please reopen if you still see this.