EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
439 stars 62 forks source link

Wrong gradient when using `bitcode_replacement!(false)` in neural ODE #1269

Closed ArnoStrouwen closed 7 months ago

ArnoStrouwen commented 7 months ago
using Lux, ComponentArrays, OrdinaryDiffEq, Optimization, OptimizationNLopt,
    OptimizationOptimisers, SciMLSensitivity, Zygote, Plots, Statistics, Random
using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)
rng = Random.default_rng()
tspan = (0.0f0, 8.0f0)

ann = Chain(Dense(1, 32, tanh), Dense(32, 32, tanh), Dense(32, 1))
ps, st = Lux.setup(rng, ann)
p = ComponentArray(ps)

θ, ax = getdata(p), getaxes(p)

function dxdt_(dx, x, p, t)
    ps = ComponentArray(p, ax)
    x1, x2 = x
    dx[1] = x[2]
    dx[2] = first(ann([t], ps, st))[1]^3
end
x0 = [-4.0f0, 0.0f0]
ts = Float32.(collect(0.0:0.01:tspan[2]))
prob = ODEProblem(dxdt_, x0, tspan, θ)
solve(prob, Vern9(), abstol = 1e-10, reltol = 1e-10)

function predict_adjoint(θ)
    Array(solve(prob, Vern9(), p = θ, saveat = ts,
        sensealg = InterpolatingAdjoint(autojacvec = EnzymeVJP())))
end
function loss_adjoint(θ)
    x = predict_adjoint(θ)
    ps = ComponentArray(θ, ax)
    mean(abs2, 4.0 .- x[1, :]) + 2mean(abs2, x[2, :]) +
    mean(abs2, [first(first(ann([t], ps, st))) for t in ts]) / 10
end

l = loss_adjoint(θ)
Zygote.gradient(loss_adjoint,θ)

Prints warning: ** On entry to SGEMV parameter number 6 had an illegal value ** Adapted from https://docs.sciml.ai/SciMLSensitivity/stable/examples/optimal_control/optimal_control/

wsmoses commented 7 months ago

Can you post your Julia and enzyme version? If the Bitcode flag was needed to be passed that means you were on an earlier version before this was marked non experimental and thus it may have been fixed since.

wsmoses commented 7 months ago

Can you also isolate this to just the Enzyme autodiff call without the wrappers

ArnoStrouwen commented 7 months ago

I think this is approximately what is going on inside SciMLSensitivity:

using Lux, ComponentArrays, OrdinaryDiffEq, SciMLSensitivity, Statistics, Random
using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)
rng = Random.default_rng()
tspan = (0.0f0, 8.0f0)

ann = Chain(Dense(1, 32, tanh), Dense(32, 32, tanh), Dense(32, 1))
ps, st = Lux.setup(rng, ann)
p = ComponentArray(ps)

θ, ax = getdata(p), getaxes(p)

function dxdt_(dx, x, p, t)
    ps = ComponentArray(p, ax)
    x1, x2 = x
    dx[1] = x[2]
    dx[2] = first(ann([t], ps, st))[1]^3
end
x0 = [-4.0f0, 0.0f0]
ts = Float32.(collect(0.0:0.01:tspan[2]))

dx = zero(x0)
function adfunc(out, u, _p, t)
    dxdt_(out, u, _p, t)
    nothing
end
Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(dx, copy(x0)),
    Enzyme.Duplicated(copy(x0), zero(x0)), Enzyme.Duplicated(copy(θ), zero(θ)), Enzyme.Const(ts[1]))
(Enzyme) pkg> st
Status `~/SciML/SciMLSensitivity.jl/Enzyme/Project.toml`
  [b0b7db55] ComponentArrays v0.15.8
  [7da242da] Enzyme v0.11.14
  [b2108857] Lux v0.5.14
  [1dea7af3] OrdinaryDiffEq v6.70.1
  [1ed8b502] SciMLSensitivity v7.55.0
  [9a3f8284] Random
  [10745b16] Statistics v1.10.0

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
  Threads: 1 on 24 virtual cores
Environment:
  JULIA_PKG_DEVDIR = /home/arno/SciML/
wsmoses commented 7 months ago

And for sake of understanding, what is the expected result here, that is not being computed correctly?

ArnoStrouwen commented 7 months ago

For the complete example, in the original post of this issue, is that the gradient is different depending on the bitcode flag.

For the paired down example, I don't know what could be the issue, nothing seems immediately wrong to me in the Duplicated vectors, but I don't have much Enzyme experience.

I reduced the example, such that it still gives the output: ** On entry to SGEMV parameter number 6 had an illegal value **.

Perhaps I paired it down too much, besides this autodiff call, there is a more complicated one also present in SciMLSensitivity, where dxdt_ gets a different wrapper, https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/adjoint_common.jl#L430-L455 and this wrapper then gets Duplicated with make_zero: https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/adjoint_common.jl#L201C47-L201C100 https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/derivative_wrappers.jl#L696C63-L696C77

wsmoses commented 7 months ago

In order to debug this properly we'll need an example:

wsmoses commented 7 months ago

Reduced to :

using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)

using LinearAlgebra
ps = zeros(Float32, 30, 1)
function adfunc(ps)
    out = Vector{Float32}(undef, 30)
    @inline LinearAlgebra.BLAS.gemv!('N', true, ps, [0.0f0], false, out)
    return out[1]
end

Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
wsmoses commented 7 months ago
julia> Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
after simplification :
; Function Attrs: mustprogress willreturn
define float @preprocess_julia_adfunc_10000({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0) local_unnamed_addr #12 !dbg !117 {
top:
  %1 = alloca i8, align 1
  %2 = alloca i64, align 16
  %3 = bitcast i64* %2 to i8*
  %4 = alloca i64, align 16
  %5 = bitcast i64* %4 to i8*
  %6 = alloca i32, align 8
  %7 = bitcast i32* %6 to i8*
  %8 = alloca i64, align 16
  %9 = bitcast i64* %8 to i8*
  %10 = alloca i64, align 16
  %11 = bitcast i64* %10 to i8*
  %12 = alloca i32, align 8
  %13 = bitcast i32* %12 to i8*
  %14 = alloca i64, align 16
  %15 = bitcast i64* %14 to i8*
  %16 = call {}*** @julia.get_pgcstack() #13
  %ptls_field124 = getelementptr inbounds {}**, {}*** %16, i64 2
  %17 = bitcast {}*** %ptls_field124 to i64***
  %ptls_load125126 = load i64**, i64*** %17, align 8, !tbaa !9
  %18 = getelementptr inbounds i64*, i64** %ptls_load125126, i64 2
  %safepoint = load i64*, i64** %18, align 8, !tbaa !13, !invariant.load !8
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #13, !dbg !118
  fence syncscope("singlethread") seq_cst
  %19 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #14, !dbg !119
  %20 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #14, !dbg !121
  %21 = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !124
  %22 = bitcast {} addrspace(10)* %20 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !124
  %23 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %22 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !124
  %24 = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !124
  %25 = addrspacecast float addrspace(13)* addrspace(10)* %24 to float addrspace(13)* addrspace(11)*, !dbg !124
  %arrayptr127 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %25, align 8, !dbg !124, !tbaa !28, !alias.scope !126, !noalias !36, !nonnull !8
  store float 0.000000e+00, float addrspace(13)* %arrayptr127, align 4, !dbg !124, !tbaa !41, !alias.scope !44, !noalias !129
  %26 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !130
  %27 = bitcast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(10)*, !dbg !130
  %28 = addrspacecast {} addrspace(10)* addrspace(10)* %27 to {} addrspace(10)* addrspace(11)*, !dbg !130
  %arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %28, i64 3, !dbg !130
  %29 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !130
  %arraysize = load i64, i64 addrspace(11)* %29, align 8, !dbg !130, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !52, !noalias !53
  %arraysize_ptr2 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %28, i64 4, !dbg !130
  %30 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr2 to i64 addrspace(11)*, !dbg !130
  %arraysize3 = load i64, i64 addrspace(11)* %30, align 16, !dbg !130, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !52, !noalias !53
  %arraylen_ptr = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %23, i64 0, i32 1, !dbg !132
  %arraylen = load i64, i64 addrspace(11)* %arraylen_ptr, align 8, !dbg !132, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
  %31 = icmp eq i64 %arraylen, %arraysize3, !dbg !134
  br i1 %31, label %L17, label %top.L19_crit_edge, !dbg !133

top.L19_crit_edge:                                ; preds = %top
  %.phi.trans.insert = bitcast {} addrspace(10)* %19 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*
  %.phi.trans.insert143 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %.phi.trans.insert to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*
  %arraylen_ptr10.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %.phi.trans.insert143, i64 0, i32 1
  %arraylen11.pre = load i64, i64 addrspace(11)* %arraylen_ptr10.phi.trans.insert, align 8, !dbg !136, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
  br label %L19, !dbg !133

L17:                                              ; preds = %top
  %32 = bitcast {} addrspace(10)* %19 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !132
  %33 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %32 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !132
  %arraylen_ptr104 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %33, i64 0, i32 1, !dbg !132
  %arraylen105 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !132, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
  %34 = icmp eq i64 %arraylen105, %arraysize, !dbg !134
  br i1 %34, label %L29, label %L19, !dbg !133

L19:                                              ; preds = %L17, %top.L19_crit_edge
  %arraylen11 = phi i64 [ %arraylen11.pre, %top.L19_crit_edge ], [ %arraylen105, %L17 ], !dbg !136
  %current_task1123 = getelementptr inbounds {}**, {}*** %16, i64 -14
  %current_task1 = bitcast {}*** %current_task1123 to {}**
  %newstruct13 = call noalias nonnull dereferenceable(16) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803713277456 to {}*) to {} addrspace(10)*)) #15, !dbg !138
  %35 = bitcast {} addrspace(10)* %newstruct13 to {} addrspace(10)* addrspace(10)*, !dbg !138
  %36 = addrspacecast {} addrspace(10)* addrspace(10)* %35 to {} addrspace(10)* addrspace(11)*, !dbg !138
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %36, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
  %37 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %36, i64 1, !dbg !138
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %37, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
  %box = call noalias nonnull dereferenceable(56) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 56, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803908591440 to {}*) to {} addrspace(10)*)) #15, !dbg !138
  %38 = bitcast {} addrspace(10)* %box to { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !138
  %.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 0, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032336 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  %.repack128.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 1, i64 0, !dbg !138
  store i64 %arraysize, i64 addrspace(10)* %.repack128.repack, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  %.repack128.repack138 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 1, i64 1, !dbg !138
  store i64 %arraysize3, i64 addrspace(10)* %.repack128.repack138, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  %.repack130 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 2, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032304 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack130, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  %.repack132 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 3, !dbg !138
  store i64 %arraylen, i64 addrspace(10)* %.repack132, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  %.repack134 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 4, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032256 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack134, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  %.repack136 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 5, !dbg !138
  store i64 %arraylen11, i64 addrspace(10)* %.repack136, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
  store atomic {} addrspace(10)* %box, {} addrspace(10)* addrspace(11)* %36 release, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* nofree noundef nonnull %newstruct13, {} addrspace(10)* nofree nonnull %box) #16, !dbg !138
  %39 = bitcast {} addrspace(10)* %newstruct13 to i8 addrspace(10)*, !dbg !138
  %40 = addrspacecast i8 addrspace(10)* %39 to i8 addrspace(11)*, !dbg !138
  %41 = getelementptr inbounds i8, i8 addrspace(11)* %40, i64 8, !dbg !138
  %42 = bitcast i8 addrspace(11)* %41 to {} addrspace(10)* addrspace(11)*, !dbg !138
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803939037192 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %42 release, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
  %box16 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803724611632 to {}*) to {} addrspace(10)*)) #15, !dbg !137
  %43 = bitcast {} addrspace(10)* %box16 to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !137
  %44 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %43, i64 0, i64 0, !dbg !137
  store {} addrspace(10)* %newstruct13, {} addrspace(10)* addrspace(10)* %44, align 8, !dbg !137, !tbaa !75, !alias.scope !44, !noalias !129
  %45 = addrspacecast {} addrspace(10)* %box16 to {} addrspace(12)*, !dbg !137
  call void @ijl_throw({} addrspace(12)* %45) #17, !dbg !137
  unreachable, !dbg !137

L29:                                              ; preds = %L17
  %46 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %21) #18, !dbg !139
  %47 = bitcast {}* %46 to i8**, !dbg !139
  %arrayptr20 = load i8*, i8** %47, align 8, !dbg !139, !tbaa !28, !alias.scope !60, !noalias !36, !nonnull !8
  %48 = ptrtoint i8* %arrayptr20 to i64, !dbg !139
  %49 = addrspacecast {} addrspace(10)* %19 to {} addrspace(11)*, !dbg !143
  %50 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %49) #18, !dbg !143
  %51 = bitcast {}* %50 to i8**, !dbg !143
  %arrayptr22 = load i8*, i8** %51, align 8, !dbg !143, !tbaa !28, !alias.scope !60, !noalias !36, !nonnull !8
  %52 = ptrtoint i8* %arrayptr22 to i64, !dbg !143
  %53 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %26) #18, !dbg !147
  %54 = bitcast {}* %53 to i8**, !dbg !147
  %arrayptr24 = load i8*, i8** %54, align 8, !dbg !147, !tbaa !13, !invariant.load !8, !alias.scope !52, !noalias !53, !nonnull !8
  %55 = ptrtoint i8* %arrayptr24 to i64, !dbg !147
  %.not = icmp eq i64 %arraysize, 0, !dbg !150
  %56 = select i1 %.not, i64 1, i64 %arraysize, !dbg !154
  %57 = call i64 @llvm.umax.i64(i64 %arraysize, i64 %56) #13, !dbg !154
  %58 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %20, {} addrspace(10)* nonnull %19) #13, !dbg !155
  store i8 78, i8* %1, align 1, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  store i64 %arraysize, i64* %2, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  store i64 %arraysize3, i64* %4, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  %memcpy_refined_dst48 = bitcast i32* %6 to float*, !dbg !156
  store float 1.000000e+00, float* %memcpy_refined_dst48, align 8, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  store i64 %57, i64* %8, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  store i64 1, i64* %10, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  %memcpy_refined_dst55 = bitcast i32* %12 to float*, !dbg !156
  store float 0.000000e+00, float* %memcpy_refined_dst55, align 8, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  store i64 1, i64* %14, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
  call void @sgemv_64_(i8* noundef nonnull %1, i8* noundef nonnull %3, i8* noundef nonnull %5, i8* noundef nonnull %7, i64 %55, i8* noundef nonnull %9, i64 %48, i8* noundef nonnull %11, i8* noundef nonnull %13, i64 %52, i8* noundef nonnull %15, i64 noundef 1) #13 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !155
  call void @llvm.julia.gc_preserve_end(token %58) #13, !dbg !155
  %arraylen98 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !159, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
  %inbounds.not = icmp eq i64 %arraylen98, 0, !dbg !159
  br i1 %inbounds.not, label %oob, label %idxend, !dbg !159

oob:                                              ; preds = %L29
  %errorbox = alloca i64, align 8, !dbg !159
  store i64 1, i64* %errorbox, align 8, !dbg !159, !noalias !161
  %59 = addrspacecast {} addrspace(10)* %19 to {} addrspace(12)*, !dbg !159
  call void @llvm.lifetime.end.p0i8(i64 noundef 1, i8* noundef nonnull %1) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %3) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %5) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 4, i8* noundef nonnull %7) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %9) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %11) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 4, i8* noundef nonnull %13) #13
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %15) #13
  call void @ijl_bounds_error_ints({} addrspace(12)* %59, i64* noundef nonnull align 8 %errorbox, i64 noundef 1) #17, !dbg !159
  unreachable, !dbg !159

idxend:                                           ; preds = %L29
  %60 = bitcast {} addrspace(10)* %19 to float addrspace(13)* addrspace(10)*, !dbg !159
  %61 = addrspacecast float addrspace(13)* addrspace(10)* %60 to float addrspace(13)* addrspace(11)*, !dbg !159
  %arrayptr100141 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %61, align 8, !dbg !159, !tbaa !28, !alias.scope !126, !noalias !36, !nonnull !8
  %arrayref = load float, float addrspace(13)* %arrayptr100141, align 4, !dbg !159, !tbaa !41, !alias.scope !44, !noalias !116
  ret float %arrayref, !dbg !160
}

; Function Attrs: mustprogress willreturn
define internal void @diffejulia_adfunc_10000({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* align 16 %"'", float %differeturn) local_unnamed_addr #12 !dbg !162 {
top:
  %"arrayref'de" = alloca float, align 4
  %1 = getelementptr float, float* %"arrayref'de", i64 0
  store float 0.000000e+00, float* %1, align 4
  %byref. = alloca i64, align 8
  %ret = alloca float, align 4
  %byref.int.one = alloca i64, align 8
  %byref.transpose.transa = alloca i8, align 1
  %byref.constant.char.N = alloca i8, align 1
  %byref.constant.fp.1.0 = alloca float, align 4
  %2 = alloca i8, align 1
  %3 = alloca i64, align 16
  %4 = bitcast i64* %3 to i8*
  %5 = alloca i64, align 16
  %6 = bitcast i64* %5 to i8*
  %7 = alloca i32, align 8
  %8 = bitcast i32* %7 to i8*
  %9 = alloca i64, align 16
  %10 = bitcast i64* %9 to i8*
  %11 = alloca i64, align 16
  %12 = bitcast i64* %11 to i8*
  %13 = alloca i32, align 8
  %14 = bitcast i32* %13 to i8*
  %15 = alloca i64, align 16
  %16 = bitcast i64* %15 to i8*
  %17 = call {}*** @julia.get_pgcstack() #15
  %ptls_field124 = getelementptr inbounds {}**, {}*** %17, i64 2
  %18 = bitcast {}*** %ptls_field124 to i64***
  %ptls_load125126 = load i64**, i64*** %18, align 8, !tbaa !9, !alias.scope !163, !noalias !166
  %19 = getelementptr inbounds i64*, i64** %ptls_load125126, i64 2
  %safepoint = load i64*, i64** %19, align 8, !tbaa !13, !invariant.load !8, !alias.scope !168, !noalias !171
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #15, !dbg !173
  fence syncscope("singlethread") seq_cst
  %20 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 30), !dbg !174
  %21 = bitcast {} addrspace(10)* %20 to i8 addrspace(13)* addrspace(10)*, !dbg !174
  %22 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %21, align 8, !dbg !174
  call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %22, i8 0, i64 120, i1 false), !dbg !174
  %23 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #16, !dbg !174
  %24 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 1), !dbg !176
  %25 = bitcast {} addrspace(10)* %24 to i8 addrspace(13)* addrspace(10)*, !dbg !176
  %26 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %25, align 8, !dbg !176
  call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %26, i8 0, i64 4, i1 false), !dbg !176
  %27 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #16, !dbg !176
  %"'ipc15" = addrspacecast {} addrspace(10)* %24 to {} addrspace(11)*, !dbg !179
  %28 = addrspacecast {} addrspace(10)* %27 to {} addrspace(11)*, !dbg !179
  %29 = bitcast {} addrspace(10)* %27 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !179
  %30 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %29 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !179
  %"'ipc" = bitcast {} addrspace(10)* %24 to float addrspace(13)* addrspace(10)*, !dbg !179
  %31 = bitcast {} addrspace(10)* %27 to float addrspace(13)* addrspace(10)*, !dbg !179
  %"'ipc4" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc" to float addrspace(13)* addrspace(11)*, !dbg !179
  %32 = addrspacecast float addrspace(13)* addrspace(10)* %31 to float addrspace(13)* addrspace(11)*, !dbg !179
  %"arrayptr127'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc4", align 8, !dbg !179, !tbaa !28, !alias.scope !181, !noalias !184, !nonnull !8
  %arrayptr127 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %32, align 8, !dbg !179, !tbaa !28, !alias.scope !186, !noalias !187, !nonnull !8
  store float 0.000000e+00, float addrspace(13)* %arrayptr127, align 4, !dbg !179, !tbaa !41, !alias.scope !188, !noalias !191
  %"'ipc11" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !193
  %33 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !193
  %34 = bitcast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(10)*, !dbg !193
  %35 = addrspacecast {} addrspace(10)* addrspace(10)* %34 to {} addrspace(10)* addrspace(11)*, !dbg !193
  %arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %35, i64 3, !dbg !193
  %36 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !193
  %arraysize = load i64, i64 addrspace(11)* %36, align 8, !dbg !193, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !195, !noalias !198
  %arraysize_ptr2 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %35, i64 4, !dbg !193
  %37 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr2 to i64 addrspace(11)*, !dbg !193
  %arraysize3 = load i64, i64 addrspace(11)* %37, align 16, !dbg !193, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !195, !noalias !198
  %arraylen_ptr = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %30, i64 0, i32 1, !dbg !200
  %arraylen = load i64, i64 addrspace(11)* %arraylen_ptr, align 8, !dbg !200, !tbaa !58, !range !51, !alias.scope !202, !noalias !187
  %38 = icmp eq i64 %arraylen, %arraysize3, !dbg !203
  br i1 %38, label %L17, label %top.L19_crit_edge, !dbg !201

top.L19_crit_edge:                                ; preds = %top
  %.phi.trans.insert = bitcast {} addrspace(10)* %23 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*
  %.phi.trans.insert143 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %.phi.trans.insert to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*
  %arraylen_ptr10.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %.phi.trans.insert143, i64 0, i32 1
  %arraylen11.pre = load i64, i64 addrspace(11)* %arraylen_ptr10.phi.trans.insert, align 8, !dbg !205, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
  unreachable

L17:                                              ; preds = %top
  %39 = bitcast {} addrspace(10)* %23 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !200
  %40 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %39 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !200
  %arraylen_ptr104 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %40, i64 0, i32 1, !dbg !200
  %arraylen105 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !200, !tbaa !58, !range !51, !alias.scope !207, !noalias !210
  %41 = icmp eq i64 %arraylen105, %arraysize, !dbg !203
  br i1 %41, label %L29, label %L19, !dbg !201

L19:                                              ; preds = %L17
  %arraylen11 = phi i64 [ %arraylen105, %L17 ], !dbg !205
  %current_task1123 = getelementptr inbounds {}**, {}*** %17, i64 -14
  %current_task1 = bitcast {}*** %current_task1123 to {}**
  %newstruct13 = call noalias nonnull dereferenceable(16) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803713277456 to {}*) to {} addrspace(10)*)) #17, !dbg !212
  %42 = bitcast {} addrspace(10)* %newstruct13 to {} addrspace(10)* addrspace(10)*, !dbg !212
  %43 = addrspacecast {} addrspace(10)* addrspace(10)* %42 to {} addrspace(10)* addrspace(11)*, !dbg !212
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %43, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
  %44 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %43, i64 1, !dbg !212
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %44, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
  %box = call noalias nonnull dereferenceable(56) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 56, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803908591440 to {}*) to {} addrspace(10)*)) #17, !dbg !212
  %45 = bitcast {} addrspace(10)* %box to { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !212
  %.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 0, !dbg !212
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032336 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  %.repack128.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 1, i64 0, !dbg !212
  store i64 %arraysize, i64 addrspace(10)* %.repack128.repack, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  %.repack128.repack138 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 1, i64 1, !dbg !212
  store i64 %arraysize3, i64 addrspace(10)* %.repack128.repack138, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  %.repack130 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 2, !dbg !212
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032304 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack130, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  %.repack132 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 3, !dbg !212
  store i64 %arraylen, i64 addrspace(10)* %.repack132, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  %.repack134 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 4, !dbg !212
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032256 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack134, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  %.repack136 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 5, !dbg !212
  store i64 %arraylen11, i64 addrspace(10)* %.repack136, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
  store atomic {} addrspace(10)* %box, {} addrspace(10)* addrspace(11)* %43 release, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* nofree noundef nonnull %newstruct13, {} addrspace(10)* nofree nonnull %box) #18, !dbg !212
  %46 = bitcast {} addrspace(10)* %newstruct13 to i8 addrspace(10)*, !dbg !212
  %47 = addrspacecast i8 addrspace(10)* %46 to i8 addrspace(11)*, !dbg !212
  %48 = getelementptr inbounds i8, i8 addrspace(11)* %47, i64 8, !dbg !212
  %49 = bitcast i8 addrspace(11)* %48 to {} addrspace(10)* addrspace(11)*, !dbg !212
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803939037192 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %49 release, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
  %box16 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803724611632 to {}*) to {} addrspace(10)*)) #17, !dbg !206
  %50 = bitcast {} addrspace(10)* %box16 to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !206
  %51 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %50, i64 0, i64 0, !dbg !206
  store {} addrspace(10)* %newstruct13, {} addrspace(10)* addrspace(10)* %51, align 8, !dbg !206, !tbaa !75, !alias.scope !44, !noalias !213
  %52 = addrspacecast {} addrspace(10)* %box16 to {} addrspace(12)*, !dbg !206
  call void @ijl_throw({} addrspace(12)* %52) #19, !dbg !206
  unreachable

L29:                                              ; preds = %L17
  %53 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc15"), !dbg !216
  %54 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %28) #20, !dbg !216
  %"'ipc14" = bitcast {}* %53 to i8**, !dbg !216
  %55 = bitcast {}* %54 to i8**, !dbg !216
  %"arrayptr20'ipl" = load i8*, i8** %"'ipc14", align 8, !dbg !216, !tbaa !28, !alias.scope !220, !noalias !184, !nonnull !8
  %arrayptr20 = load i8*, i8** %55, align 8, !dbg !216, !tbaa !28, !alias.scope !202, !noalias !187, !nonnull !8
  %"'ipc6" = ptrtoint i8* %"arrayptr20'ipl" to i64, !dbg !216
  %56 = ptrtoint i8* %arrayptr20 to i64, !dbg !216
  %"'ipc13" = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !221
  %57 = addrspacecast {} addrspace(10)* %23 to {} addrspace(11)*, !dbg !221
  %58 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc13"), !dbg !221
  %59 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %57) #20, !dbg !221
  %"'ipc12" = bitcast {}* %58 to i8**, !dbg !221
  %60 = bitcast {}* %59 to i8**, !dbg !221
  %"arrayptr22'ipl" = load i8*, i8** %"'ipc12", align 8, !dbg !221, !tbaa !28, !alias.scope !225, !noalias !226, !nonnull !8
  %arrayptr22 = load i8*, i8** %60, align 8, !dbg !221, !tbaa !28, !alias.scope !207, !noalias !210, !nonnull !8
  %"'ipc7" = ptrtoint i8* %"arrayptr22'ipl" to i64, !dbg !221
  %61 = ptrtoint i8* %arrayptr22 to i64, !dbg !221
  %62 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc11"), !dbg !227
  %63 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %33) #20, !dbg !227
  %"'ipc10" = bitcast {}* %62 to i8**, !dbg !227
  %64 = bitcast {}* %63 to i8**, !dbg !227
  %"arrayptr24'ipl" = load i8*, i8** %"'ipc10", align 8, !dbg !227, !tbaa !13, !alias.scope !230, !noalias !231, !nonnull !8
  %arrayptr24 = load i8*, i8** %64, align 8, !dbg !227, !tbaa !13, !invariant.load !8, !alias.scope !195, !noalias !198, !nonnull !8
  %"'ipc5" = ptrtoint i8* %"arrayptr24'ipl" to i64, !dbg !227
  %65 = ptrtoint i8* %arrayptr24 to i64, !dbg !227
  %.not = icmp eq i64 %arraysize, 0, !dbg !232
  %66 = select i1 %.not, i64 1, i64 %arraysize, !dbg !236
  %67 = call i64 @llvm.umax.i64(i64 %arraysize, i64 %66) #15, !dbg !236
  %68 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !237
  store i8 78, i8* %2, align 1, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  store i64 %arraysize, i64* %3, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  store i64 %arraysize3, i64* %5, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  %memcpy_refined_dst48 = bitcast i32* %7 to float*, !dbg !238
  store float 1.000000e+00, float* %memcpy_refined_dst48, align 8, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  store i64 %67, i64* %9, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  store i64 1, i64* %11, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  %memcpy_refined_dst55 = bitcast i32* %13 to float*, !dbg !238
  store float 0.000000e+00, float* %memcpy_refined_dst55, align 8, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  store i64 1, i64* %15, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
  %69 = bitcast i8* %4 to i64*, !dbg !237
  %70 = load i64, i64* %69, align 8, !dbg !237
  %71 = bitcast i8* %6 to i64*, !dbg !237
  %72 = load i64, i64* %71, align 8, !dbg !237
  %73 = mul i64 %70, %72, !dbg !237
  %74 = mul nuw i64 %73, 4, !dbg !237
  %75 = call noalias nonnull i8* @malloc(i64 %74), !dbg !237
  %cache.A = bitcast i8* %75 to float*, !dbg !237
  %76 = bitcast i8* %10 to i64*, !dbg !237
  %77 = load i64, i64* %76, align 8, !dbg !237
  %78 = inttoptr i64 %65 to float*, !dbg !237
  call void @__enzyme_memcpy_float_mat_64(float* %cache.A, float* %78, i64 %70, i64 %72, i64 %77) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
  %loaded.trans = load i8, i8* %2, align 1, !dbg !237
  %79 = icmp eq i8 %loaded.trans, 78, !dbg !237
  %80 = icmp eq i8 %loaded.trans, 110, !dbg !237
  %81 = or i1 %80, %79, !dbg !237
  %82 = select i1 %81, i8* %6, i8* %4, !dbg !237
  %83 = bitcast i8* %82 to i64*, !dbg !237
  %84 = load i64, i64* %83, align 8, !dbg !237
  %85 = mul nuw i64 %84, 4, !dbg !237
  %86 = call noalias nonnull i8* @malloc(i64 %85), !dbg !237
  %cache.x = bitcast i8* %86 to float*, !dbg !237
  store i64 1, i64* %byref., align 8, !dbg !237
  %intcast. = bitcast i64* %byref. to i8*, !dbg !237
  call void @scopy_64_(i8* %82, i64 %56, i8* %12, float* %cache.x, i8* %intcast.) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
  call void @sgemv_64_(i8* noundef nonnull %2, i8* noundef nonnull %4, i8* noundef nonnull %6, i8* noundef nonnull %8, i64 %65, i8* noundef nonnull %10, i64 %56, i8* noundef nonnull %12, i8* noundef nonnull %14, i64 %61, i8* noundef nonnull %16, i64 noundef 1) #15 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
  call void @llvm.julia.gc_preserve_end(token %68) #15, !dbg !237
  %arraylen98 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !241, !tbaa !58, !range !51, !alias.scope !207, !noalias !210
  %inbounds.not = icmp eq i64 %arraylen98, 0, !dbg !241
  br i1 %inbounds.not, label %oob, label %idxend, !dbg !241

oob:                                              ; preds = %L29
  %errorbox = alloca i64, align 8, !dbg !241
  store i64 1, i64* %errorbox, align 8, !dbg !241, !noalias !243
  %87 = addrspacecast {} addrspace(10)* %23 to {} addrspace(12)*, !dbg !241
  call void @ijl_bounds_error_ints({} addrspace(12)* %87, i64* noundef nonnull align 8 %errorbox, i64 noundef 1) #19, !dbg !241
  unreachable

idxend:                                           ; preds = %L29
  %"'ipc16" = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !241
  %"'ipc17" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc16" to float addrspace(13)* addrspace(11)*, !dbg !241
  %"arrayptr100141'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc17", align 8, !dbg !241, !tbaa !28, !alias.scope !244, !noalias !226, !nonnull !8
  br label %invertidxend, !dbg !242

inverttop:                                        ; preds = %invertL17
  store float 0.000000e+00, float addrspace(13)* %"arrayptr127'ipl", align 4, !dbg !179, !tbaa !41, !alias.scope !245, !noalias !246
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void

invertL17:                                        ; preds = %invertL29
  br label %inverttop

invertL29:                                        ; preds = %invertidxend
  %88 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !237
  %89 = ptrtoint float* %cache.A to i64, !dbg !237
  %90 = ptrtoint float* %cache.x to i64, !dbg !237
  store i64 1, i64* %byref.int.one, align 8, !dbg !237
  %intcast.int.one = bitcast i64* %byref.int.one to i8*, !dbg !237
  %ld.row.trans = load i8, i8* %2, align 1, !dbg !237
  %91 = icmp eq i8 %ld.row.trans, 110, !dbg !237
  %92 = icmp eq i8 %ld.row.trans, 78, !dbg !237
  %93 = or i1 %92, %91, !dbg !237
  %94 = select i1 %93, i64 %"'ipc7", i64 %90, !dbg !237
  %95 = select i1 %93, i8* %16, i8* %intcast.int.one, !dbg !237
  %96 = select i1 %93, i64 %90, i64 %"'ipc7", !dbg !237
  %97 = select i1 %93, i8* %intcast.int.one, i8* %16, !dbg !237
  call void @sger_64_(i8* %4, i8* %6, i8* %8, i64 %94, i8* %95, i64 %96, i8* %97, i64 %"'ipc5", i8* %10) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
  %ld.transa = load i8, i8* %2, align 1, !dbg !237
  %98 = icmp eq i8 %ld.transa, 110, !dbg !237
  %99 = select i1 %98, i8 116, i8 0, !dbg !237
  %100 = icmp eq i8 %ld.transa, 78, !dbg !237
  %101 = select i1 %100, i8 84, i8 %99, !dbg !237
  %102 = icmp eq i8 %ld.transa, 116, !dbg !237
  %103 = select i1 %102, i8 110, i8 %101, !dbg !237
  %104 = icmp eq i8 %ld.transa, 84, !dbg !237
  %105 = select i1 %104, i8 78, i8 %103, !dbg !237
  store i8 %105, i8* %byref.transpose.transa, align 1, !dbg !237
  store i8 78, i8* %byref.constant.char.N, align 1, !dbg !237
  %loaded.trans8 = load i8, i8* %byref.constant.char.N, align 1, !dbg !237
  %106 = icmp eq i8 %loaded.trans8, 78, !dbg !237
  %107 = icmp eq i8 %loaded.trans8, 110, !dbg !237
  %108 = or i1 %107, %106, !dbg !237
  %109 = select i1 %108, i8* %6, i8* %4, !dbg !237
  store float 1.000000e+00, float* %byref.constant.fp.1.0, align 4, !dbg !237
  %fpcast.constant.fp.1.0 = bitcast float* %byref.constant.fp.1.0 to i8*, !dbg !237
  call void @sgemv_64_(i8* %byref.transpose.transa, i8* %4, i8* %6, i8* %8, i64 %89, i8* %109, i64 %"'ipc7", i8* %16, i8* %fpcast.constant.fp.1.0, i64 %"'ipc6", i8* %12, i64 1) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
  %ld.row.trans9 = load i8, i8* %2, align 1, !dbg !237
  %110 = icmp eq i8 %ld.row.trans9, 110, !dbg !237
  %111 = icmp eq i8 %ld.row.trans9, 78, !dbg !237
  %112 = or i1 %111, %110, !dbg !237
  %113 = select i1 %112, i8* %4, i8* %6, !dbg !237
  call void @sscal_64_(i8* %113, i8* %14, i64 %"'ipc7", i8* %16) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
  %114 = bitcast float* %cache.A to i8*, !dbg !237
  call void @free(i8* nonnull %114), !dbg !237
  %115 = bitcast float* %cache.x to i8*, !dbg !237
  call void @free(i8* nonnull %115), !dbg !237
  call void @llvm.julia.gc_preserve_end(token %88), !dbg !237
  br label %invertL17

invertidxend:                                     ; preds = %idxend
  store float %differeturn, float* %"arrayref'de", align 4
  %116 = load float, float* %"arrayref'de", align 4, !dbg !241
  store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !241
  %117 = load float, float addrspace(13)* %"arrayptr100141'ipl", align 4, !dbg !241, !tbaa !41, !alias.scope !247, !noalias !250
  %118 = fadd fast float %117, %116, !dbg !241
  store float %118, float addrspace(13)* %"arrayptr100141'ipl", align 4, !dbg !241, !tbaa !41, !alias.scope !247, !noalias !250
  br label %invertL29
}

 ** On entry to SGEMV  parameter number  6 had an illegal value
((nothing,),)

@ZuseZ4 anything pop out?

wsmoses commented 7 months ago
using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)

using LinearAlgebra
ps = zeros(Float32, 30, 1)
function adfunc(A)
    Y = Vector{Float32}(undef, 30)
    X = [1.0f0]

    trans = 'N'
    m,n = 30, 1
    lda = 1
    pX, sX = pointer(X), 1
    pY, sY = pointer(Y), 1
    pA = pointer(A)
    lda = 30
    alpha = 1.0f0
    beta = 0.0f0
    GC.@preserve A X Y ccall((:sgemv_64_, LinearAlgebra.BLAS.libblastrampoline), Cvoid,
        (Ref{UInt8}, Ref{LinearAlgebra.BLAS.BlasInt}, Ref{LinearAlgebra.BLAS.BlasInt}, Ref{Float32},
         Ptr{Float32}, Ref{LinearAlgebra.BLAS.BlasInt}, Ptr{Float32}, Ref{LinearAlgebra.BLAS.BlasInt},
         Ref{Float32}, Ptr{Float32}, Ref{LinearAlgebra.BLAS.BlasInt}, Clong),
         trans, 30, 1, alpha,
         pA, lda, pX, sX,
         beta, pY, sY, 1)

    return @inbounds Y[1]
end

adfunc(ps)
Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
wsmoses commented 7 months ago
julia> Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
after simplification :
; Function Attrs: mustprogress willreturn
define float @preprocess_julia_adfunc_10183({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0) local_unnamed_addr #8 !dbg !78 {
top:
  %1 = alloca i8, align 1
  %2 = alloca i64, align 16
  %3 = bitcast i64* %2 to i8*
  %4 = alloca i64, align 16
  %5 = bitcast i64* %4 to i8*
  %6 = alloca i32, align 8
  %7 = bitcast i32* %6 to i8*
  %8 = alloca i64, align 16
  %9 = bitcast i64* %8 to i8*
  %10 = alloca i64, align 16
  %11 = bitcast i64* %10 to i8*
  %12 = alloca i32, align 8
  %13 = bitcast i32* %12 to i8*
  %14 = alloca i64, align 16
  %15 = bitcast i64* %14 to i8*
  %16 = call {}*** @julia.get_pgcstack() #9
  %ptls_field71 = getelementptr inbounds {}**, {}*** %16, i64 2
  %17 = bitcast {}*** %ptls_field71 to i64***
  %ptls_load7273 = load i64**, i64*** %17, align 8, !tbaa !8
  %18 = getelementptr inbounds i64*, i64** %ptls_load7273, i64 2
  %safepoint = load i64*, i64** %18, align 8, !tbaa !12, !invariant.load !7
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #9, !dbg !79
  fence syncscope("singlethread") seq_cst
  %19 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #10, !dbg !80
  %20 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #10, !dbg !82
  %21 = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !85
  %22 = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !85
  %23 = addrspacecast float addrspace(13)* addrspace(10)* %22 to float addrspace(13)* addrspace(11)*, !dbg !85
  %arrayptr74 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %23, align 8, !dbg !85, !tbaa !27, !alias.scope !87, !noalias !35, !nonnull !7
  store float 1.000000e+00, float addrspace(13)* %arrayptr74, align 4, !dbg !85, !tbaa !40, !alias.scope !43, !noalias !90
  %24 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %21) #11, !dbg !91
  %25 = bitcast {}* %24 to i8**, !dbg !91
  %arrayptr3 = load i8*, i8** %25, align 8, !dbg !91, !tbaa !27, !alias.scope !52, !noalias !35, !nonnull !7
  %26 = addrspacecast {} addrspace(10)* %19 to {} addrspace(11)*, !dbg !94
  %27 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %26) #11, !dbg !94
  %28 = bitcast {}* %27 to i8**, !dbg !94
  %arrayptr5 = load i8*, i8** %28, align 8, !dbg !94, !tbaa !27, !alias.scope !52, !noalias !35, !nonnull !7
  %29 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !97
  %30 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %29) #11, !dbg !97
  %31 = bitcast {}* %30 to i8**, !dbg !97
  %arrayptr7 = load i8*, i8** %31, align 8, !dbg !97, !tbaa !12, !invariant.load !7, !alias.scope !59, !noalias !60, !nonnull !7
  %32 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nofree nonnull %0, {} addrspace(10)* nonnull %20, {} addrspace(10)* nonnull %19) #9, !dbg !100
  store i8 78, i8* %1, align 1, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  store i64 30, i64* %2, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  store i64 1, i64* %4, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  %memcpy_refined_dst18 = bitcast i32* %6 to float*, !dbg !101
  store float 1.000000e+00, float* %memcpy_refined_dst18, align 8, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  store i64 30, i64* %8, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  store i64 1, i64* %10, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  %memcpy_refined_dst27 = bitcast i32* %12 to float*, !dbg !101
  store float 0.000000e+00, float* %memcpy_refined_dst27, align 8, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  store i64 1, i64* %14, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
  %33 = ptrtoint i8* %arrayptr7 to i64, !dbg !97
  %34 = ptrtoint i8* %arrayptr5 to i64, !dbg !94
  %35 = ptrtoint i8* %arrayptr3 to i64, !dbg !91
  call void @sgemv_64_(i8* noundef nonnull %1, i8* noundef nonnull %3, i8* noundef nonnull %5, i8* noundef nonnull %7, i64 %33, i8* noundef nonnull %9, i64 %35, i8* noundef nonnull %11, i8* noundef nonnull %13, i64 %34, i8* noundef nonnull %15, i64 noundef 1) #9 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !100
  call void @llvm.julia.gc_preserve_end(token %32) #9, !dbg !100
  %36 = bitcast {} addrspace(10)* %19 to float addrspace(13)* addrspace(10)*, !dbg !104
  %37 = addrspacecast float addrspace(13)* addrspace(10)* %36 to float addrspace(13)* addrspace(11)*, !dbg !104
  %arrayptr6875 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %37, align 8, !dbg !104, !tbaa !27, !alias.scope !87, !noalias !35, !nonnull !7
  %arrayref = load float, float addrspace(13)* %arrayptr6875, align 4, !dbg !104, !tbaa !40, !alias.scope !43, !noalias !77
  ret float %arrayref, !dbg !105
}

; Function Attrs: mustprogress willreturn
define internal void @diffejulia_adfunc_10183({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* align 16 %"'", float %differeturn) local_unnamed_addr #8 !dbg !106 {
top:
  %"arrayref'de" = alloca float, align 4
  %1 = getelementptr float, float* %"arrayref'de", i64 0
  store float 0.000000e+00, float* %1, align 4
  %byref. = alloca i64, align 8
  %ret = alloca float, align 4
  %byref.int.one = alloca i64, align 8
  %byref.transpose.transa = alloca i8, align 1
  %byref.constant.char.N = alloca i8, align 1
  %byref.constant.fp.1.0 = alloca float, align 4
  %2 = alloca i8, align 1
  %3 = alloca i64, align 16
  %4 = bitcast i64* %3 to i8*
  %5 = alloca i64, align 16
  %6 = bitcast i64* %5 to i8*
  %7 = alloca i32, align 8
  %8 = bitcast i32* %7 to i8*
  %9 = alloca i64, align 16
  %10 = bitcast i64* %9 to i8*
  %11 = alloca i64, align 16
  %12 = bitcast i64* %11 to i8*
  %13 = alloca i32, align 8
  %14 = bitcast i32* %13 to i8*
  %15 = alloca i64, align 16
  %16 = bitcast i64* %15 to i8*
  %17 = call {}*** @julia.get_pgcstack() #11
  %ptls_field71 = getelementptr inbounds {}**, {}*** %17, i64 2
  %18 = bitcast {}*** %ptls_field71 to i64***
  %ptls_load7273 = load i64**, i64*** %18, align 8, !tbaa !8, !alias.scope !107, !noalias !110
  %19 = getelementptr inbounds i64*, i64** %ptls_load7273, i64 2
  %safepoint = load i64*, i64** %19, align 8, !tbaa !12, !invariant.load !7, !alias.scope !112, !noalias !115
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #11, !dbg !117
  fence syncscope("singlethread") seq_cst
  %20 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 30), !dbg !118
  %21 = bitcast {} addrspace(10)* %20 to i8 addrspace(13)* addrspace(10)*, !dbg !118
  %22 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %21, align 8, !dbg !118
  call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %22, i8 0, i64 120, i1 false), !dbg !118
  %23 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #12, !dbg !118
  %24 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 1), !dbg !120
  %25 = bitcast {} addrspace(10)* %24 to i8 addrspace(13)* addrspace(10)*, !dbg !120
  %26 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %25, align 8, !dbg !120
  call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %26, i8 0, i64 4, i1 false), !dbg !120
  %27 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #12, !dbg !120
  %"'ipc16" = addrspacecast {} addrspace(10)* %24 to {} addrspace(11)*, !dbg !123
  %28 = addrspacecast {} addrspace(10)* %27 to {} addrspace(11)*, !dbg !123
  %"'ipc17" = bitcast {} addrspace(10)* %24 to float addrspace(13)* addrspace(10)*, !dbg !123
  %29 = bitcast {} addrspace(10)* %27 to float addrspace(13)* addrspace(10)*, !dbg !123
  %"'ipc18" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc17" to float addrspace(13)* addrspace(11)*, !dbg !123
  %30 = addrspacecast float addrspace(13)* addrspace(10)* %29 to float addrspace(13)* addrspace(11)*, !dbg !123
  %"arrayptr74'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc18", align 8, !dbg !123, !tbaa !27, !alias.scope !125, !noalias !128, !nonnull !7
  %arrayptr74 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %30, align 8, !dbg !123, !tbaa !27, !alias.scope !130, !noalias !131, !nonnull !7
  store float 1.000000e+00, float addrspace(13)* %arrayptr74, align 4, !dbg !123, !tbaa !40, !alias.scope !132, !noalias !135
  %31 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc16"), !dbg !137
  %32 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %28) #13, !dbg !137
  %"'ipc15" = bitcast {}* %31 to i8**, !dbg !137
  %33 = bitcast {}* %32 to i8**, !dbg !137
  %"arrayptr3'ipl" = load i8*, i8** %"'ipc15", align 8, !dbg !137, !tbaa !27, !alias.scope !140, !noalias !128, !nonnull !7
  %arrayptr3 = load i8*, i8** %33, align 8, !dbg !137, !tbaa !27, !alias.scope !141, !noalias !131, !nonnull !7
  %"'ipc14" = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !142
  %34 = addrspacecast {} addrspace(10)* %23 to {} addrspace(11)*, !dbg !142
  %35 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc14"), !dbg !142
  %36 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %34) #13, !dbg !142
  %"'ipc13" = bitcast {}* %35 to i8**, !dbg !142
  %37 = bitcast {}* %36 to i8**, !dbg !142
  %"arrayptr5'ipl" = load i8*, i8** %"'ipc13", align 8, !dbg !142, !tbaa !27, !alias.scope !145, !noalias !148, !nonnull !7
  %arrayptr5 = load i8*, i8** %37, align 8, !dbg !142, !tbaa !27, !alias.scope !150, !noalias !151, !nonnull !7
  %"'ipc12" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !152
  %38 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !152
  %39 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc12"), !dbg !152
  %40 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %38) #13, !dbg !152
  %"'ipc11" = bitcast {}* %39 to i8**, !dbg !152
  %41 = bitcast {}* %40 to i8**, !dbg !152
  %"arrayptr7'ipl" = load i8*, i8** %"'ipc11", align 8, !dbg !152, !tbaa !12, !alias.scope !155, !noalias !158, !nonnull !7
  %arrayptr7 = load i8*, i8** %41, align 8, !dbg !152, !tbaa !12, !invariant.load !7, !alias.scope !160, !noalias !161, !nonnull !7
  %42 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !162
  store i8 78, i8* %2, align 1, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  store i64 30, i64* %3, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  store i64 1, i64* %5, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  %memcpy_refined_dst18 = bitcast i32* %7 to float*, !dbg !163
  store float 1.000000e+00, float* %memcpy_refined_dst18, align 8, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  store i64 30, i64* %9, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  store i64 1, i64* %11, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  %memcpy_refined_dst27 = bitcast i32* %13 to float*, !dbg !163
  store float 0.000000e+00, float* %memcpy_refined_dst27, align 8, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  store i64 1, i64* %15, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
  %"'ipc6" = ptrtoint i8* %"arrayptr7'ipl" to i64, !dbg !152
  %43 = ptrtoint i8* %arrayptr7 to i64, !dbg !152
  %"'ipc8" = ptrtoint i8* %"arrayptr5'ipl" to i64, !dbg !142
  %44 = ptrtoint i8* %arrayptr5 to i64, !dbg !142
  %"'ipc7" = ptrtoint i8* %"arrayptr3'ipl" to i64, !dbg !137
  %45 = ptrtoint i8* %arrayptr3 to i64, !dbg !137
  %46 = bitcast i8* %4 to i64*, !dbg !162
  %47 = load i64, i64* %46, align 8, !dbg !162
  %48 = bitcast i8* %6 to i64*, !dbg !162
  %49 = load i64, i64* %48, align 8, !dbg !162
  %50 = mul i64 %47, %49, !dbg !162
  %51 = mul nuw i64 %50, 4, !dbg !162
  %52 = call noalias nonnull i8* @malloc(i64 %51), !dbg !162
  %cache.A = bitcast i8* %52 to float*, !dbg !162
  %53 = bitcast i8* %10 to i64*, !dbg !162
  %54 = load i64, i64* %53, align 8, !dbg !162
  %55 = inttoptr i64 %43 to float*, !dbg !162
  call void @__enzyme_memcpy_float_mat_64(float* %cache.A, float* %55, i64 %47, i64 %49, i64 %54) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
  %loaded.trans = load i8, i8* %2, align 1, !dbg !162
  %56 = icmp eq i8 %loaded.trans, 78, !dbg !162
  %57 = icmp eq i8 %loaded.trans, 110, !dbg !162
  %58 = or i1 %57, %56, !dbg !162
  %59 = select i1 %58, i8* %6, i8* %4, !dbg !162
  %60 = bitcast i8* %59 to i64*, !dbg !162
  %61 = load i64, i64* %60, align 8, !dbg !162
  %62 = mul nuw i64 %61, 4, !dbg !162
  %63 = call noalias nonnull i8* @malloc(i64 %62), !dbg !162
  %cache.x = bitcast i8* %63 to float*, !dbg !162
  store i64 1, i64* %byref., align 8, !dbg !162
  %intcast. = bitcast i64* %byref. to i8*, !dbg !162
  call void @scopy_64_(i8* %59, i64 %45, i8* %12, float* %cache.x, i8* %intcast.) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
  call void @sgemv_64_(i8* noundef nonnull %2, i8* noundef nonnull %4, i8* noundef nonnull %6, i8* noundef nonnull %8, i64 %43, i8* noundef nonnull %10, i64 %45, i8* noundef nonnull %12, i8* noundef nonnull %14, i64 %44, i8* noundef nonnull %16, i64 noundef 1) #11 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
  call void @llvm.julia.gc_preserve_end(token %42) #11, !dbg !162
  %"'ipc" = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !169
  %"'ipc4" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc" to float addrspace(13)* addrspace(11)*, !dbg !169
  %"arrayptr6875'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc4", align 8, !dbg !169, !tbaa !27, !alias.scope !171, !noalias !148, !nonnull !7
  br label %inverttop, !dbg !170

inverttop:                                        ; preds = %top
  store float %differeturn, float* %"arrayref'de", align 4
  %64 = load float, float* %"arrayref'de", align 4, !dbg !169
  store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !169
  %65 = load float, float addrspace(13)* %"arrayptr6875'ipl", align 4, !dbg !169, !tbaa !40, !alias.scope !172, !noalias !175
  %66 = fadd fast float %65, %64, !dbg !169
  store float %66, float addrspace(13)* %"arrayptr6875'ipl", align 4, !dbg !169, !tbaa !40, !alias.scope !172, !noalias !175
  %67 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !162
  %68 = ptrtoint float* %cache.A to i64, !dbg !162
  %69 = ptrtoint float* %cache.x to i64, !dbg !162
  store i64 1, i64* %byref.int.one, align 8, !dbg !162
  %intcast.int.one = bitcast i64* %byref.int.one to i8*, !dbg !162
  %ld.row.trans = load i8, i8* %2, align 1, !dbg !162
  %70 = icmp eq i8 %ld.row.trans, 110, !dbg !162
  %71 = icmp eq i8 %ld.row.trans, 78, !dbg !162
  %72 = or i1 %71, %70, !dbg !162
  %73 = select i1 %72, i64 %"'ipc8", i64 %69, !dbg !162
  %74 = select i1 %72, i8* %16, i8* %intcast.int.one, !dbg !162
  %75 = select i1 %72, i64 %69, i64 %"'ipc8", !dbg !162
  %76 = select i1 %72, i8* %intcast.int.one, i8* %16, !dbg !162
  call void @sger_64_(i8* %4, i8* %6, i8* %8, i64 %73, i8* %74, i64 %75, i8* %76, i64 %"'ipc6", i8* %10) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
  %ld.transa = load i8, i8* %2, align 1, !dbg !162
  %77 = icmp eq i8 %ld.transa, 110, !dbg !162
  %78 = select i1 %77, i8 116, i8 0, !dbg !162
  %79 = icmp eq i8 %ld.transa, 78, !dbg !162
  %80 = select i1 %79, i8 84, i8 %78, !dbg !162
  %81 = icmp eq i8 %ld.transa, 116, !dbg !162
  %82 = select i1 %81, i8 110, i8 %80, !dbg !162
  %83 = icmp eq i8 %ld.transa, 84, !dbg !162
  %84 = select i1 %83, i8 78, i8 %82, !dbg !162
  store i8 %84, i8* %byref.transpose.transa, align 1, !dbg !162
  store i8 78, i8* %byref.constant.char.N, align 1, !dbg !162
  %loaded.trans9 = load i8, i8* %byref.constant.char.N, align 1, !dbg !162
  %85 = icmp eq i8 %loaded.trans9, 78, !dbg !162
  %86 = icmp eq i8 %loaded.trans9, 110, !dbg !162
  %87 = or i1 %86, %85, !dbg !162
  %88 = select i1 %87, i8* %6, i8* %4, !dbg !162
  store float 1.000000e+00, float* %byref.constant.fp.1.0, align 4, !dbg !162
  %fpcast.constant.fp.1.0 = bitcast float* %byref.constant.fp.1.0 to i8*, !dbg !162
  call void @sgemv_64_(i8* %byref.transpose.transa, i8* %4, i8* %6, i8* %8, i64 %68, i8* %88, i64 %"'ipc8", i8* %16, i8* %fpcast.constant.fp.1.0, i64 %"'ipc7", i8* %12, i64 1) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
  %ld.row.trans10 = load i8, i8* %2, align 1, !dbg !162
  %89 = icmp eq i8 %ld.row.trans10, 110, !dbg !162
  %90 = icmp eq i8 %ld.row.trans10, 78, !dbg !162
  %91 = or i1 %90, %89, !dbg !162
  %92 = select i1 %91, i8* %4, i8* %6, !dbg !162
  call void @sscal_64_(i8* %92, i8* %14, i64 %"'ipc8", i8* %16) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
  %93 = bitcast float* %cache.A to i8*, !dbg !162
  call void @free(i8* nonnull %93), !dbg !162
  %94 = bitcast float* %cache.x to i8*, !dbg !162
  call void @free(i8* nonnull %94), !dbg !162
  call void @llvm.julia.gc_preserve_end(token %67), !dbg !162
  store float 0.000000e+00, float addrspace(13)* %"arrayptr74'ipl", align 4, !dbg !123, !tbaa !40, !alias.scope !177, !noalias !178
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void
}

 ** On entry to SGEMV  parameter number  6 had an illegal value
((nothing,),)
wsmoses commented 7 months ago

Should be fixed by https://github.com/EnzymeAD/Enzyme.jl/pull/1281

please reopen if not.

ArnoStrouwen commented 7 months ago

It now produces the correct result with bitcode replacement on and off. However, I am a bit surprised that the allocations are exactly the same in both versions: on:

julia> @btime Zygote.gradient(loss_adjoint,θ)
┌ Warning: Using fallback BLAS replacements for (["ssymv_64_"]), performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/U36Ed/src/utils.jl:59
  1.089 s (5063232 allocations: 674.09 MiB)
(Float32[-48.12045, 96.89185, 5.4492106, -136.30328, -277.6249, -2.9152653, 159.34677, -252.21376, -168.57451, 95.22521  …  28.876875, 58.53126, -94.83481, 123.85488, 202.57362, 72.3266, -231.3183, -164.42274, -63.517776, -324.779],)

off:

julia> @btime Zygote.gradient(loss_adjoint,θ)
  1.113 s (5063232 allocations: 674.09 MiB)
(Float32[-48.12045, 96.89185, 5.4492106, -136.30328, -277.6249, -2.9152653, 159.34677, -252.21376, -168.57451, 95.22521  …  28.876875, 58.53126, -94.83481, 123.85488, 202.57362, 72.3266, -231.3183, -164.42274, -63.517776, -324.779],)

If different BLAS code is used, would you not expect at least some difference in allocations?

ZuseZ4 commented 7 months ago

As far as I know Julia tooling measures allocations on a higher level, so such low-level allocations won't be caught when using the rules. I assume the same holds for the fallback.

wsmoses commented 7 months ago

As of the latest release, for reverse mode, using bitcode replacement has no effect for dot/gemm/gemv/etc, all of these will use tablegen rather than fallback blas