mkitti commented 11 months ago

Parameterize ThinPlateSpline
Use hcat rather than cat(dims = 2, ...)

anj1 commented 11 months ago

I'm not sure what these extra parameters are doing? What kind of usage are you envisioning where these parameters would help avoid bugs? A concrete example might be good.

mkitti commented 11 months ago

Currently the field types of ThinPlateSplines are Any:

julia> fieldtypes(ThinPlateSpline)
(Any, Any, Any, Any, Any, Any)

This leads to type instability harming performance: https://m3g.github.io/JuliaNotes.jl/stable/instability/

Type instability makes it very difficult for Julia to compile this code efficiently. Looking at tps_deform we see these Any fields propagate throughout the function lowering.

julia> @code_warntype tps_deform(x1, tps)
MethodInstance for ThinPlateSplines.tps_deform(::Matrix{Float64}, ::ThinPlateSpline)
  from tps_deform(x2::AbstractArray{T, D}, tps::ThinPlateSpline) where {T, D} @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:107
Static Parameters
  T = Float64
  D = 2
Arguments
  #self#::Core.Const(ThinPlateSplines.tps_deform)
  x2::Matrix{Float64}
  tps::ThinPlateSpline
Locals
  yt::Any
  sumsqr::Any
  all_homo_z::Any
  c::Any
  d::Any
  x1::Any
  𝒜𝒸𝓉!@_10   ::ThinPlateSplines.var"#𝒜𝒸𝓉!#4"
  ℳ𝒶𝓀ℯ@_11  ::ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"}
  𝒜𝒸𝓉!@_12   ::ThinPlateSplines.var"#𝒜𝒸𝓉!#4"
  ≪1:D≫::UnitRange{Int64}
  𝒜𝒸𝓉!@_14   ::ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"
  ℳ𝒶𝓀ℯ@_15  ::ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"}
  𝒜𝒸𝓉!@_16   ::ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"
  𝒶𝓍i  ::Any
  𝒶𝓍j  ::Any
  𝒶𝓍l  ::Any
  𝒜𝒸𝓉!@_20   ::ThinPlateSplines.var"#23#𝒜𝒸𝓉!#10"
Body::Any
1 ──        Core.NewvarNode(:(yt))
│           Core.NewvarNode(:(sumsqr))
│           Core.NewvarNode(:(all_homo_z))
│    %4   = Base.getproperty(tps, :x1)::Any
│    %5   = Base.getproperty(tps, :d)::Any
│    %6   = Base.getproperty(tps, :c)::Any
│           (x1 = %4)
│           (d = %5)
│           (c = %6)
│    %10  = d::Any
│    %11  = Base.vect()::Vector{Any}
│    %12  = (%10 == %11)::Any
└───        goto #3 if not %12
2 ── %14  = ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.")::Any
│           ThinPlateSplines.throw(%14)
└───        Core.Const(:(goto %17))
3 ┄─ %17  = $(Expr(:static_parameter, 1))::Core.Const(Float64)
│    %18  = ThinPlateSplines.size(x2, 1)::Int64
│    %19  = ThinPlateSplines.ones(%17, %18)::Vector{Float64}
│    %20  = (:dims,)::Core.Const((:dims,))
│    %21  = Core.apply_type(Core.NamedTuple, %20)::Core.Const(NamedTuple{(:dims,)})
│    %22  = Core.tuple(2)::Core.Const((2,))
│    %23  = (%21)(%22)::Core.Const((dims = 2,))
│           (all_homo_z = Core.kwcall(%23, ThinPlateSplines.cat, %19, x2))
│           Core.NewvarNode(:(𝒜𝒸𝓉!@_10   ))
│    %26  = (ndims)(all_homo_z)::Any
│    %27  = (%26 == 2)::Any
└───        goto #5 if not %27
4 ──        goto #6
5 ──        (throw)("expected a 2-array all_homo_z")
6 ┄─ %31  = (ndims)(x1)::Any
│    %32  = (%31 == 2)::Any
└───        goto #8 if not %32
7 ──        goto #9
8 ──        (throw)("expected a 2-array x1")
9 ┄─        (𝒜𝒸𝓉!@_10    = %new(ThinPlateSplines.:(var"#𝒜𝒸𝓉!#4"   )))
│    %37  = 𝒜𝒸𝓉!@_10   ::Core.Const(ThinPlateSplines.var"#𝒜𝒸𝓉!#4"())
│           (𝒜𝒸𝓉!@_12    = %37)
│    %39  = ThinPlateSplines.:(var"#ℳ𝒶𝓀ℯ#5"  )::Core.Const(ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5")
│    %40  = Core.typeof(𝒜𝒸𝓉!@_12   )::Core.Const(ThinPlateSplines.var"#𝒜𝒸𝓉!#4")
│    %41  = Core.apply_type(%39, %40)::Core.Const(ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"})
│           (ℳ𝒶𝓀ℯ@_11   = %new(%41, 𝒜𝒸𝓉!@_12   ))
│    %43  = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_11  , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"}, Nothing}     (ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"}(ThinPlateSplines.var"#𝒜𝒸𝓉!#4"()), nothing        ))
│    %44  = all_homo_z::Any
│    %45  = (%43)(%44, x1)::Any
│           (sumsqr = %45)
│           Core.NewvarNode(:(≪1:D≫))
│           Core.NewvarNode(:(𝒜𝒸𝓉!@_14   ))
│    %49  = (ndims)(sumsqr)::Any
│    %50  = (%49 == 2)::Any
└───        goto #11 if not %50
10 ─        goto #12
11 ─        (throw)("expected a 2-array sumsqr")
12 ┄ %54  = (ndims)(c)::Any
│    %55  = (%54 == 2)::Any
└───        goto #14 if not %55
13 ─        goto #15
14 ─        (throw)("expected a 2-array c")
15 ┄        (≪1:D≫ = 1:$(Expr(:static_parameter, 2)))
│    %60  = (≪1:D≫::Core.Const(1:2) isa ThinPlateSplines.AbstractRange)::Core.Const(true)
└───        goto #17 if not %60
16 ─        goto #18
17 ─        Core.Const(:(1:$(Expr(:static_parameter, 2))))
│           Core.Const(:(ThinPlateSplines.string(%63)))
│           Core.Const(:("expected a range for (j in 1:D), got " * %64))
└───        Core.Const(:(ThinPlateSplines.throw(%65)))
18 ┄        (𝒜𝒸𝓉!@_14    = %new(ThinPlateSplines.:(var"#19#𝒜𝒸𝓉!#7"   )))
│    %68  = 𝒜𝒸𝓉!@_14   ::Core.Const(ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"())
│           (𝒜𝒸𝓉!@_16    = %68)
│    %70  = ThinPlateSplines.:(var"#20#ℳ𝒶𝓀ℯ#8"  )::Core.Const(ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8")
│    %71  = Core.typeof(𝒜𝒸𝓉!@_16   )::Core.Const(ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7")
│    %72  = Core.apply_type(%70, %71)::Core.Const(ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"})
│           (ℳ𝒶𝓀ℯ@_15   = %new(%72, 𝒜𝒸𝓉!@_16   ))
│    %74  = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_15  , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"}, No     thing}(ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7"}(ThinPlateSplines.var"#19#𝒜𝒸𝓉!        #7"()), nothing))
│    %75  = sumsqr::Any
│    %76  = c::Any
│    %77  = (%74)(%75, %76, ≪1:D≫::Core.Const(1:2))::Any
│           (yt = %77)
│           Core.NewvarNode(:(𝒶𝓍i  ))
│           Core.NewvarNode(:(𝒶𝓍j  ))
│           Core.NewvarNode(:(𝒶𝓍l  ))
│           Core.NewvarNode(:(𝒜𝒸𝓉!@_20   ))
│    %83  = (ndims)(yt)::Any
│    %84  = (%83 == 2)::Any
└───        goto #20 if not %84
19 ─        goto #21
20 ─        (throw)("expected a 2-array yt")
21 ┄ %88  = (ndims)(d)::Any
│    %89  = (%88 == 2)::Any
└───        goto #23 if not %89
22 ─        goto #24
23 ─        (throw)("expected a 2-array d")
24 ┄ %93  = (ndims)(all_homo_z)::Any
│    %94  = (%93 == 2)::Any
└───        goto #26 if not %94
25 ─        goto #27
26 ─        (throw)("expected a 2-array all_homo_z")
27 ┄        (𝒜𝒸𝓉!@_20    = %new(ThinPlateSplines.:(var"#23#𝒜𝒸𝓉!#10"   )))
│           (𝒶𝓍l   = (axes)(d, 1))
│    %100 = (axes)(all_homo_z, 2)::Any
│    %101 = (axes)(d, 1)::Any
│    %102 = (%100 == %101)::Any
└───        goto #29 if not %102
28 ─        goto #30
29 ─        (throw)("range of index l must agree")
30 ┄ %106 = (axes)(yt, 2)::Any
│    %107 = ThinPlateSplines.:-::Core.Const(-)
│    %108 = (axes)(d, 2)::Any
│    %109 = Base.broadcasted(%107, %108, 1)::Any
│    %110 = Base.materialize(%109)::Any
│           (𝒶𝓍j   = ThinPlateSplines.intersect(%106, %110))
│           (𝒶𝓍i   = (axes)(yt, 1))
│    %113 = (axes)(all_homo_z, 1)::Any
│    %114 = (axes)(yt, 1)::Any
│    %115 = (%113 == %114)::Any
└───        goto #32 if not %115
31 ─        goto #33
32 ─        (throw)("range of index i must agree")
33 ┄ %119 = 𝒜𝒸𝓉!@_20   ::Core.Const(ThinPlateSplines.var"#23#𝒜𝒸𝓉!#10"())
│    %120 = (Tullio.storage_type)(yt, d, all_homo_z)::Any
│    %121 = yt::Any
│    %122 = ThinPlateSplines.tuple(d, all_homo_z)::Tuple{Any, Any}
│    %123 = ThinPlateSplines.tuple(𝒶𝓍i  , 𝒶𝓍j  )::Tuple{Any, Any}
│    %124 = ThinPlateSplines.tuple(𝒶𝓍l  )::Tuple{Any}
│    %125 = ThinPlateSplines.:+::Core.Const(+)
│           (Tullio.threader)(%119, %120, %121, %122, %123, %124, %125, 262144, true)
│           yt
└───        return yt

After this pull request, all the types are now known to be Float64. Julia can now produce efficient machine code now that it knows the machine types.

julia> @code_warntype tps_deform(x1, tps)
MethodInstance for ThinPlateSplines.tps_deform(::Matrix{Float64}, ::ThinPlateSpline{Float64, Matrix{Float64}, Float64})
  from tps_deform(x2::AbstractMatrix{T}, tps::ThinPlateSpline) where T @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:107
Static Parameters
  T = Float64
Arguments
  #self#::Core.Const(ThinPlateSplines.tps_deform)
  x2::Matrix{Float64}
  tps::ThinPlateSpline{Float64, Matrix{Float64}, Float64}
Locals
  yt::Matrix{Float64}
  sumsqr::Matrix{Float64}
  all_homo_z::Matrix{Float64}
  D::Int64
  c::Matrix{Float64}
  d::Matrix{Float64}
  x1::Matrix{Float64}
  𝒜𝒸𝓉!@_11   ::ThinPlateSplines.var"#𝒜𝒸𝓉!#4"
  ℳ𝒶𝓀ℯ@_12  ::ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"}
  𝒜𝒸𝓉!@_13   ::ThinPlateSplines.var"#𝒜𝒸𝓉!#4"
  ≪1:D≫::UnitRange{Int64}
  𝒜𝒸𝓉!@_15   ::ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"
  ℳ𝒶𝓀ℯ@_16  ::ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"}
  𝒜𝒸𝓉!@_17   ::ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"
  𝒶𝓍i  ::Base.OneTo{Int64}
  𝒶𝓍j  ::UnitRange{Int64}
  𝒶𝓍l  ::Base.OneTo{Int64}
  𝒜𝒸𝓉!@_21   ::ThinPlateSplines.var"#26#𝒜𝒸𝓉!#10"
Body::Matrix{Float64}
1 ──        Core.NewvarNode(:(yt))
│           Core.NewvarNode(:(sumsqr))
│           Core.NewvarNode(:(all_homo_z))
│           Core.NewvarNode(:(D))
│    %5   = Base.getproperty(tps, :x1)::Matrix{Float64}
│    %6   = Base.getproperty(tps, :d)::Matrix{Float64}
│    %7   = Base.getproperty(tps, :c)::Matrix{Float64}
│           (x1 = %5)
│           (d = %6)
│           (c = %7)
│    %11  = d::Matrix{Float64}
│    %12  = Base.vect()::Vector{Any}
│    %13  = (%11 == %12)::Core.Const(false)
└───        goto #3 if not %13
2 ──        Core.Const(:(ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.")))
│           Core.Const(:(ThinPlateSplines.throw(%15)))
└───        Core.Const(:(goto %18))
3 ┄─        (D = ThinPlateSplines.size(x2, 2))
│    %19  = $(Expr(:static_parameter, 1))::Core.Const(Float64)
│    %20  = ThinPlateSplines.size(x2, 1)::Int64
│    %21  = ThinPlateSplines.ones(%19, %20)::Vector{Float64}
│           (all_homo_z = ThinPlateSplines.hcat(%21, x2))
│           Core.NewvarNode(:(𝒜𝒸𝓉!@_11   ))
│    %24  = (ndims)(all_homo_z)::Core.Const(2)
│    %25  = (%24 == 2)::Core.Const(true)
└───        goto #5 if not %25
4 ──        goto #6
5 ──        Core.Const(:((throw)("expected a 2-array all_homo_z")))
6 ┄─ %29  = (ndims)(x1)::Core.Const(2)
│    %30  = (%29 == 2)::Core.Const(true)
└───        goto #8 if not %30
7 ──        goto #9
8 ──        Core.Const(:((throw)("expected a 2-array x1")))
9 ┄─        (𝒜𝒸𝓉!@_11    = %new(ThinPlateSplines.:(var"#𝒜𝒸𝓉!#4"   )))
│    %35  = 𝒜𝒸𝓉!@_11   ::Core.Const(ThinPlateSplines.var"#𝒜𝒸𝓉!#4"())
│           (𝒜𝒸𝓉!@_13    = %35)
│    %37  = ThinPlateSplines.:(var"#ℳ𝒶𝓀ℯ#5"  )::Core.Const(ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5")
│    %38  = Core.typeof(𝒜𝒸𝓉!@_13   )::Core.Const(ThinPlateSplines.var"#𝒜𝒸𝓉!#4")
│    %39  = Core.apply_type(%37, %38)::Core.Const(ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"})
│           (ℳ𝒶𝓀ℯ@_12   = %new(%39, 𝒜𝒸𝓉!@_13   ))
│    %41  = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_12  , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"}, Nothing}     (ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5"{ThinPlateSplines.var"#𝒜𝒸𝓉!#4"}(ThinPlateSplines.var"#𝒜𝒸𝓉!#4"()), nothing        ))
│    %42  = all_homo_z::Matrix{Float64}
│    %43  = (%41)(%42, x1)::Matrix{Float64}
│           (sumsqr = %43)
│           Core.NewvarNode(:(≪1:D≫))
│           Core.NewvarNode(:(𝒜𝒸𝓉!@_15   ))
│    %47  = (ndims)(sumsqr)::Core.Const(2)
│    %48  = (%47 == 2)::Core.Const(true)
└───        goto #11 if not %48
10 ─        goto #12
11 ─        Core.Const(:((throw)("expected a 2-array sumsqr")))
12 ┄ %52  = (ndims)(c)::Core.Const(2)
│    %53  = (%52 == 2)::Core.Const(true)
└───        goto #14 if not %53
13 ─        goto #15
14 ─        Core.Const(:((throw)("expected a 2-array c")))
15 ┄        (≪1:D≫ = 1:D)
│    %58  = (≪1:D≫::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]) isa ThinPlateSplines.AbstractRange)::Core.Const(true)
└───        goto #17 if not %58
16 ─        goto #18
17 ─        Core.Const(:(1:D))
│           Core.Const(:(ThinPlateSplines.string(%61)))
│           Core.Const(:("expected a range for (j in 1:D), got " * %62))
└───        Core.Const(:(ThinPlateSplines.throw(%63)))
18 ┄        (𝒜𝒸𝓉!@_15    = %new(ThinPlateSplines.:(var"#22#𝒜𝒸𝓉!#7"   )))
│    %66  = 𝒜𝒸𝓉!@_15   ::Core.Const(ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"())
│           (𝒜𝒸𝓉!@_17    = %66)
│    %68  = ThinPlateSplines.:(var"#23#ℳ𝒶𝓀ℯ#8"  )::Core.Const(ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8")
│    %69  = Core.typeof(𝒜𝒸𝓉!@_17   )::Core.Const(ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7")
│    %70  = Core.apply_type(%68, %69)::Core.Const(ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"})
│           (ℳ𝒶𝓀ℯ@_16   = %new(%70, 𝒜𝒸𝓉!@_17   ))
│    %72  = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_16  , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"}, No     thing}(ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8"{ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7"}(ThinPlateSplines.var"#22#𝒜𝒸𝓉!        #7"()), nothing))
│    %73  = sumsqr::Matrix{Float64}
│    %74  = c::Matrix{Float64}
│    %75  = (%72)(%73, %74, ≪1:D≫::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]))::Matrix{Float64}
│           (yt = %75)
│           Core.NewvarNode(:(𝒶𝓍i  ))
│           Core.NewvarNode(:(𝒶𝓍j  ))
│           Core.NewvarNode(:(𝒶𝓍l  ))
│           Core.NewvarNode(:(𝒜𝒸𝓉!@_21   ))
│    %81  = (ndims)(yt)::Core.Const(2)
│    %82  = (%81 == 2)::Core.Const(true)
└───        goto #20 if not %82
19 ─        goto #21
20 ─        Core.Const(:((throw)("expected a 2-array yt")))
21 ┄ %86  = (ndims)(d)::Core.Const(2)
│    %87  = (%86 == 2)::Core.Const(true)
└───        goto #23 if not %87
22 ─        goto #24
23 ─        Core.Const(:((throw)("expected a 2-array d")))
24 ┄ %91  = (ndims)(all_homo_z)::Core.Const(2)
│    %92  = (%91 == 2)::Core.Const(true)
└───        goto #26 if not %92
25 ─        goto #27
26 ─        Core.Const(:((throw)("expected a 2-array all_homo_z")))
27 ┄        (𝒜𝒸𝓉!@_21    = %new(ThinPlateSplines.:(var"#26#𝒜𝒸𝓉!#10"   )))
│           (𝒶𝓍l   = (axes)(d, 1))
│    %98  = (axes)(all_homo_z, 2)::Base.OneTo{Int64}
│    %99  = (axes)(d, 1)::Base.OneTo{Int64}
│    %100 = (%98 == %99)::Bool
└───        goto #29 if not %100
28 ─        goto #30
29 ─        (throw)("range of index l must agree")
30 ┄ %104 = (axes)(yt, 2)::Base.OneTo{Int64}
│    %105 = ThinPlateSplines.:-::Core.Const(-)
│    %106 = (axes)(d, 2)::Base.OneTo{Int64}
│    %107 = Base.broadcasted(%105, %106, 1)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(0), Int64])
│    %108 = Base.materialize(%107)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(0), Int64])
│           (𝒶𝓍j   = ThinPlateSplines.intersect(%104, %108))
│           (𝒶𝓍i   = (axes)(yt, 1))
│    %111 = (axes)(all_homo_z, 1)::Base.OneTo{Int64}
│    %112 = (axes)(yt, 1)::Base.OneTo{Int64}
│    %113 = (%111 == %112)::Bool
└───        goto #32 if not %113
31 ─        goto #33
32 ─        (throw)("range of index i must agree")
33 ┄ %117 = 𝒜𝒸𝓉!@_21   ::Core.Const(ThinPlateSplines.var"#26#𝒜𝒸𝓉!#10"())
│    %118 = (Tullio.storage_type)(yt, d, all_homo_z)::Core.Const(Matrix{Float64})
│    %119 = yt::Matrix{Float64}
│    %120 = ThinPlateSplines.tuple(d, all_homo_z)::Tuple{Matrix{Float64}, Matrix{Float64}}
│    %121 = ThinPlateSplines.tuple(𝒶𝓍i  , 𝒶𝓍j  ::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]))::Core.PartialStruct(Tuple{Base.OneTo{Int64}, UnitRange{Int64}}, Any[Base.OneTo{Int64}, Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])])
│    %122 = ThinPlateSplines.tuple(𝒶𝓍l  )::Tuple{Base.OneTo{Int64}}
│    %123 = ThinPlateSplines.:+::Core.Const(+)
│           (Tullio.threader)(%117, %118, %119, %120, %121, %122, %123, 262144, true)
│           yt
└───        return yt

anj1 commented 11 months ago

You're mentioning "efficient machine code" and also "lowered code" (which are different concepts) but that is just the typed code, not the machine code or the lowered code. Can you show an example benchmark showing this changing the performance of the code?

Also please read https://www.oxinabox.net/2020/04/19/Julia-Antipatterns.html#over-constraining-argument-types

I think some type constraints could be beneficial here, but I'm not fully convinced yet that having three type parameters is necessarily the right way to do it. I'm open to suggestions however.

mkitti commented 11 months ago

Type stability in Julia has enormous performance benefits.

julia> x1 = [0.0 1.0
             1.0 0.0
             1.0 1.0]
3×2 Matrix{Float64}:
 0.0  1.0
 1.0  0.0
 1.0  1.0

julia> x2 = [0.0 1.0
             1.1 0.0
             1.2 1.5]
3×2 Matrix{Float64}:
 0.0  1.0
 1.1  0.0
 1.2  1.5

julia> tps = tps_solve(x1, x2, 1.0)
ThinPlateSpline(1.0, [0.0 1.0; 1.0 0.0; 1.0 1.0], [1.0 0.0 1.0; 1.0 1.1 0.0; 1.0 1.2 1.5], [0.0 0.6931471805599455 0.0; 0.6931471805599455 0.0 0.0; 0.0 0.0 0.0], [1.0 -0.09999999999999976 -0.5; 3.294173637189667e-16 1.1999999999999997 0.49999999999999994; -1.5700924586837752e-16 0.09999999999999981 1.5], [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0])

Master branch

julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.944 μs … 472.878 μs  ┊ GC (min … max): 0.00% … 98.75%
 Time  (median):     3.167 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.506 μs ±   6.657 μs  ┊ GC (mean ± σ):  2.65% ±  1.39%

  ██▅▂▆▆▅▄▄▄▃▄▃▃▃▃▃▂▂▁▁▃▂▁    ▃▂▁▁▃▂▁▁▂▂ ▁▂ ▁ ▁               ▂
  ████████████████████████▇▇███████████████████▇██▇▇▇▇▆▇▆▆▅▆▆ █
  2.94 μs      Histogram: log(frequency) by time      5.38 μs <

 Memory estimate: 1.27 KiB, allocs estimate: 37.

This pull request

julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
 Range (min … max):  439.394 ns …  35.221 μs  ┊ GC (min … max): 0.00% … 94.01%
 Time  (median):     454.040 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   525.449 ns ± 678.367 ns  ┊ GC (mean ± σ):  5.28% ±  4.25%

  █▆▆▃▂▁▂▃▃▃▂ ▄▃▁       ▁                                       ▂
  █████████████████▇▇█▇▇██▇▅▅▄▄▅▄▄▃▁▁▁▃▄▄▃▄▅▃▃▁▄▄▅▃▁▁▃▃▁▁▁▁▁▃▁▆ █
  439 ns        Histogram: log(frequency) by time       1.26 μs <

 Memory estimate: 480 bytes, allocs estimate: 5.

mkitti commented 11 months ago

You're mentioning "efficient machine code" and also "lowered code" (which are different concepts) but that is just the typed code, not the machine code or the lowered code. Can you show an example benchmark showing this changing the performance of the code?

See the prior post for benchmarks.

The type code actually has already been lowered. I will elaborate on this in the next comment.

Also please read https://www.oxinabox.net/2020/04/19/Julia-Antipatterns.html#over-constraining-argument-types

Frames, @oxinabox, is talking about method arguments there. We are mostly talking about struct fields here. Without these type parameters Julia cannot know what the types of the fields are when it accesses them. For tps_deform the consequence of this is that we cannot predict the return type. @code_warntype indicates it is Body::Any.

I think some type constraints could be beneficial here, but I'm not fully convinced yet that having three type parameters is necessarily the right way to do it. I'm open to suggestions however.

We could simplify the type parameter to a single one. However, this now requires that λ be of the same element type as x1.

struct ThinPlateSpline{T}
    λ::T          # Stiffness.
    x1::Matrix{T} # control points
    Y::Matrix{T}  # Homogeneous control point coordinates
    Φ::Matrix{T}  # TPS kernel
    d::Matrix{T}  # Affine component
    c::Matrix{T}  # Non-affine component
end

However, this will fail the tests as I have currently written them.

julia> x1 = [0 1
             1 0
             1 1]
3×2 Matrix{Int64}:
 0  1
 1  0
 1  1

julia> x2 = [0 1
             1 0
             1.2 1.5]
3×2 Matrix{Float64}:
 0.0  1.0
 1.0  0.0
 1.2  1.5

julia> using ThinPlateSplines
[ Info: Precompiling ThinPlateSplines [1d861738-f48e-4029-b1d3-81ce6bc7f5ab]

julia> tps = tps_solve(x1, x2, 1.0)
ERROR: MethodError: no method matching ThinPlateSpline(::Float64, ::Matrix{Int64}, ::Matrix{Float64}, ::Matrix{Float64}, ::Matrix{Float64}, ::Matrix{Float64})

Closest candidates are:
  ThinPlateSpline(::T, ::Matrix{T}, ::Matrix{T}, ::Matrix{T}, ::Matrix{T}, ::Matrix{T}) where T
   @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:24

Stacktrace:
 [1] tps_solve(x::Matrix{Int64}, y::Matrix{Float64}, λ::Float64; compute_affine::Bool)
   @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:83
 [2] tps_solve(x::Matrix{Int64}, y::Matrix{Float64}, λ::Float64)
   @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:61
 [3] top-level scope
   @ REPL[5]:1

julia> typeof(x1)
Matrix{Int64} (alias for Array{Int64, 2})

The reason for this is that x1 is a Matrix{Int64} where as the other components use Float64. To maintain flexibility and compatability, I let the input parameters freely vary independently from the calculated fields. x1, λ, and Y can be of different (element) types.

mkitti commented 11 months ago

master branch

@code_lowered debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_lowered debuginfo=:none tps_deform(x1, tps) CodeInfo( 1 ── Core.NewvarNode(:(yt)) │ Core.NewvarNode(:(sumsqr)) │ Core.NewvarNode(:(all_homo_z)) │ Core.NewvarNode(:(D)) │ %5 = Base.getproperty(tps, :x1) │ %6 = Base.getproperty(tps, :d) │ %7 = Base.getproperty(tps, :c) │ x1 = %5 │ d = %6 │ c = %7 │ %11 = d │ %12 = Base.vect() │ %13 = %11 == %12 └─── goto #3 if not %13 2 ── %15 = ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.") │ ThinPlateSplines.throw(%15) └─── goto #3 3 ┄─ D = ThinPlateSplines.size(x2, 2) │ %19 = $(Expr(:static_parameter, 1)) │ %20 = ThinPlateSplines.size(x2, 1) │ %21 = ThinPlateSplines.ones(%19, %20) │ %22 = (:dims,) │ %23 = Core.apply_type(Core.NamedTuple, %22) │ %24 = Core.tuple(2) │ %25 = (%23)(%24) │ all_homo_z = Core.kwcall(%25, ThinPlateSplines.cat, %21, x2) │ Core.NewvarNode(:(𝒜𝒸𝓉!@_11 )) │ %28 = (ndims)(all_homo_z) │ %29 = %28 == 2 └─── goto #5 if not %29 4 ── goto #6 5 ── (throw)("expected a 2-array all_homo_z") 6 ┄─ %33 = (ndims)(x1) │ %34 = %33 == 2 └─── goto #8 if not %34 7 ── goto #9 8 ── (throw)("expected a 2-array x1") 9 ┄─ 𝒜𝒸𝓉!@_11 = %new(ThinPlateSplines.:(var"#𝒜𝒸𝓉!#4" )) │ %39 = 𝒜𝒸𝓉!@_11 │ 𝒜𝒸𝓉!@_13 = %39 │ %41 = ThinPlateSplines.:(var"#ℳ𝒶𝓀ℯ#5" ) │ %42 = Core.typeof(𝒜𝒸𝓉!@_13 ) │ %43 = Core.apply_type(%41, %42) │ ℳ𝒶𝓀ℯ@_12 = %new(%43, 𝒜𝒸𝓉!@_13 ) │ %45 = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_12 , nothing) │ %46 = all_homo_z │ %47 = (%45)(%46, x1) │ sumsqr = %47 │ Core.NewvarNode(:(≪1:D≫)) │ Core.NewvarNode(:(𝒜𝒸𝓉!@_15 )) │ %51 = (ndims)(sumsqr) │ %52 = %51 == 2 └─── goto #11 if not %52 10 ─ goto #12 11 ─ (throw)("expected a 2-array sumsqr") 12 ┄ %56 = (ndims)(c) │ %57 = %56 == 2 └─── goto #14 if not %57 13 ─ goto #15 14 ─ (throw)("expected a 2-array c") 15 ┄ ≪1:D≫ = 1:D │ %62 = ≪1:D≫ isa ThinPlateSplines.AbstractRange └─── goto #17 if not %62 16 ─ goto #18 17 ─ %65 = 1:D │ %66 = ThinPlateSplines.string(%65) │ %67 = "expected a range for (j in 1:D), got " * %66 └─── ThinPlateSplines.throw(%67) 18 ┄ 𝒜𝒸𝓉!@_15 = %new(ThinPlateSplines.:(var"#19#𝒜𝒸𝓉!#7" )) │ %70 = 𝒜𝒸𝓉!@_15 │ 𝒜𝒸𝓉!@_17 = %70 │ %72 = ThinPlateSplines.:(var"#20#ℳ𝒶𝓀ℯ#8" ) │ %73 = Core.typeof(𝒜𝒸𝓉!@_17 ) │ %74 = Core.apply_type(%72, %73) │ ℳ𝒶𝓀ℯ@_16 = %new(%74, 𝒜𝒸𝓉!@_17 ) │ %76 = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_16 , nothing) │ %77 = sumsqr │ %78 = c │ %79 = (%76)(%77, %78, ≪1:D≫) │ yt = %79 │ Core.NewvarNode(:(𝒶𝓍i )) │ Core.NewvarNode(:(𝒶𝓍j )) │ Core.NewvarNode(:(𝒶𝓍l )) │ Core.NewvarNode(:(𝒜𝒸𝓉!@_21 )) │ %85 = (ndims)(yt) │ %86 = %85 == 2 └─── goto #20 if not %86 19 ─ goto #21 20 ─ (throw)("expected a 2-array yt") 21 ┄ %90 = (ndims)(d) │ %91 = %90 == 2 └─── goto #23 if not %91 22 ─ goto #24 23 ─ (throw)("expected a 2-array d") 24 ┄ %95 = (ndims)(all_homo_z) │ %96 = %95 == 2 └─── goto #26 if not %96 25 ─ goto #27 26 ─ (throw)("expected a 2-array all_homo_z") 27 ┄ 𝒜𝒸𝓉!@_21 = %new(ThinPlateSplines.:(var"#23#𝒜𝒸𝓉!#10" )) │ 𝒶𝓍l = (axes)(d, 1) │ %102 = (axes)(all_homo_z, 2) │ %103 = (axes)(d, 1) │ %104 = %102 == %103 └─── goto #29 if not %104 28 ─ goto #30 29 ─ (throw)("range of index l must agree") 30 ┄ %108 = (axes)(yt, 2) │ %109 = ThinPlateSplines.:- │ %110 = (axes)(d, 2) │ %111 = Base.broadcasted(%109, %110, 1) │ %112 = Base.materialize(%111) │ 𝒶𝓍j = ThinPlateSplines.intersect(%108, %112) │ 𝒶𝓍i = (axes)(yt, 1) │ %115 = (axes)(all_homo_z, 1) │ %116 = (axes)(yt, 1) │ %117 = %115 == %116 └─── goto #32 if not %117 31 ─ goto #33 32 ─ (throw)("range of index i must agree") 33 ┄ %121 = 𝒜𝒸𝓉!@_21 │ %122 = (Tullio.storage_type)(yt, d, all_homo_z) │ %123 = yt │ %124 = ThinPlateSplines.tuple(d, all_homo_z) │ %125 = ThinPlateSplines.tuple(𝒶𝓍i , 𝒶𝓍j ) │ %126 = ThinPlateSplines.tuple(𝒶𝓍l ) │ %127 = ThinPlateSplines.:+ │ (Tullio.threader)(%121, %122, %123, %124, %125, %126, %127, 262144, true) │ yt └─── return yt ) ```

@code_typed debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_typed debuginfo=:none tps_deform(x1, tps) CodeInfo( 1 ── %1 = Base.getfield(tps, :x1)::Any │ %2 = Base.getfield(tps, :d)::Any │ %3 = Base.getfield(tps, :c)::Any │ %4 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any} │ %5 = (%2 == %4)::Any └─── goto #3 if not %5 2 ── %7 = ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.")::Any │ ThinPlateSplines.throw(%7)::Union{} └─── unreachable 3 ── %10 = Base.arraysize(x2, 2)::Int64 │ %11 = Base.arraysize(x2, 1)::Int64 │ %12 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, :(%11), :(%11)))::Vector{Int64} │ %13 = Base.arraysize(%12, 1)::Int64 │ %14 = Base.slt_int(%13, 0)::Bool │ %15 = Core.ifelse(%14, 0, %13)::Int64 │ %16 = Base.slt_int(%15, 1)::Bool └─── goto #5 if not %16 4 ── goto #6 5 ── goto #6 6 ┄─ %20 = φ (#4 => true, #5 => false)::Bool │ %21 = φ (#5 => 1)::Int64 │ %22 = φ (#5 => 1)::Int64 │ %23 = Base.not_int(%20)::Bool └─── goto #12 if not %23 7 ┄─ %25 = φ (#6 => %21, #11 => %33)::Int64 │ %26 = φ (#6 => %22, #11 => %34)::Int64 │ Base.arrayset(false, %12, 1, %25)::Vector{Int64} │ %28 = (%26 === %15)::Bool └─── goto #9 if not %28 8 ── goto #10 9 ── %31 = Base.add_int(%26, 1)::Int64 └─── goto #10 10 ┄ %33 = φ (#9 => %31)::Int64 │ %34 = φ (#9 => %31)::Int64 │ %35 = φ (#8 => true, #9 => false)::Bool │ %36 = Base.not_int(%35)::Bool └─── goto #12 if not %36 11 ─ goto #7 12 ┄ goto #13 13 ─ goto #14 14 ─ goto #15 15 ─ %42 = invoke Base._cat(2::Int64, %12::Vector{Int64}, x2::Vararg{Union{LinearAlgebra.Adjoint{Int64, Vector{Int64}}, LinearAlgebra.Transpose{Int64, Vector{Int64}}, LinearAlgebra.AbstractTriangular{Int64, A} where A<:(Matrix), LinearAlgebra.Hermitian{Int64, A} where A<:(Matrix), LinearAlgebra.Symmetric{Int64, A} where A<:(Matrix), VecOrMat{Int64}}})::Any │ %43 = (ndims)(%42)::Any │ %44 = (%43 == 2)::Any └─── goto #38 if not %44 16 ─ nothing::Nothing │ %47 = (ndims)(%1)::Any │ %48 = (%47 == 2)::Any └─── goto #37 if not %48 17 ─ nothing::Nothing │ %51 = ($(QuoteNode(Tullio.Eval{ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5" {ThinPlateSplines.var"#𝒜𝒸𝓉!#4" }, Nothing}(ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5" {ThinPlateSplines.var"#𝒜𝒸𝓉!#4" }(ThinPlateSplines.var"#𝒜𝒸𝓉!#4" ()), nothing))))(%42, %1)::Any │ %52 = (ndims)(%51)::Any │ %53 = (%52 == 2)::Any └─── goto #36 if not %53 18 ─ nothing::Nothing │ %56 = (ndims)(%3)::Any │ %57 = (%56 == 2)::Any └─── goto #35 if not %57 19 ─ nothing::Nothing │ %60 = Base.sle_int(1, %10)::Bool └─── goto #21 if not %60 20 ─ goto #22 21 ─ goto #22 22 ┄ %64 = φ (#20 => %10, #21 => 0)::Int64 │ %65 = %new(UnitRange{Int64}, 1, %64)::UnitRange{Int64} └─── goto #23 23 ─ goto #24 24 ─ nothing::Nothing │ %69 = ($(QuoteNode(Tullio.Eval{ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8" {ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7" }, Nothing}(ThinPlateSplines.var"#20#ℳ𝒶𝓀ℯ#8" {ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7" }(ThinPlateSplines.var"#19#𝒜𝒸𝓉!#7" ()), nothing))))(%51, %3, %65)::Any │ %70 = (ndims)(%69)::Any │ %71 = (%70 == 2)::Any └─── goto #34 if not %71 25 ─ nothing::Nothing │ %74 = (ndims)(%2)::Any │ %75 = (%74 == 2)::Any └─── goto #33 if not %75 26 ─ nothing::Nothing │ %78 = (ndims)(%42)::Any │ %79 = (%78 == 2)::Any └─── goto #32 if not %79 27 ─ nothing::Nothing │ %82 = %new(ThinPlateSplines.:(var"#23#𝒜𝒸𝓉!#10" ))::ThinPlateSplines.var"#23#𝒜𝒸𝓉!#10" │ %83 = (axes)(%2, 1)::Any │ %84 = (axes)(%42, 2)::Any │ %85 = (axes)(%2, 1)::Any │ %86 = (%84 == %85)::Any └─── goto #31 if not %86 28 ─ nothing::Nothing │ %89 = (axes)(%69, 2)::Any │ %90 = (axes)(%2, 2)::Any │ %91 = Base.broadcasted(ThinPlateSplines.:-, %90, 1)::Any │ %92 = Base.materialize(%91)::Any │ %93 = ThinPlateSplines.intersect(%89, %92)::Any │ %94 = (axes)(%69, 1)::Any │ %95 = (axes)(%42, 1)::Any │ %96 = (axes)(%69, 1)::Any │ %97 = (%95 == %96)::Any └─── goto #30 if not %97 29 ─ nothing::Nothing │ %100 = (Tullio.storage_type)(%69, %2, %42)::Any │ %101 = ThinPlateSplines.tuple(%2, %42)::Tuple{Any, Any} │ %102 = ThinPlateSplines.tuple(%94, %93)::Tuple{Any, Any} │ %103 = ThinPlateSplines.tuple(%83)::Tuple{Any} │ (Tullio.threader)(%82, %100, %69, %101, %102, %103, ThinPlateSplines.:+, 262144, true)::Any └─── return %69 30 ─ (throw)("range of index i must agree")::Union{} └─── unreachable 31 ─ (throw)("range of index l must agree")::Union{} └─── unreachable 32 ─ (throw)("expected a 2-array all_homo_z")::Union{} └─── unreachable 33 ─ (throw)("expected a 2-array d")::Union{} └─── unreachable 34 ─ (throw)("expected a 2-array yt")::Union{} └─── unreachable 35 ─ (throw)("expected a 2-array c")::Union{} └─── unreachable 36 ─ (throw)("expected a 2-array sumsqr")::Union{} └─── unreachable 37 ─ (throw)("expected a 2-array x1")::Union{} └─── unreachable 38 ─ (throw)("expected a 2-array all_homo_z")::Union{} └─── unreachable ) => Any ```

@code_llvm debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_llvm debuginfo=:none tps_deform(x1, tps) ; Function Attrs: uwtable define nonnull {}* @julia_tps_deform_1636({}* noundef nonnull align 16 dereferenceable(40) %0, [6 x {}*]* nocapture noundef nonnull readonly align 8 dereferenceable(48) %1) #0 { top: %2 = alloca [9 x {}*], align 8 %gcframe53 = alloca [9 x {}*], align 16 %gcframe53.sub = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 0 %.sub = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 0 %3 = bitcast [9 x {}*]* %gcframe53 to i8* call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(72) %3, i8 0, i32 72, i1 false) %4 = call {}*** inttoptr (i64 140730117225904 to {}*** ()*)() #9 %5 = bitcast [9 x {}*]* %gcframe53 to i64* store i64 28, i64* %5, align 16 %6 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 1 %7 = bitcast {}** %6 to {}*** %8 = load {}**, {}*** %4, align 8 store {}** %8, {}*** %7, align 8 %9 = bitcast {}*** %4 to {}*** store {}** %gcframe53.sub, {}*** %9, align 8 %10 = getelementptr inbounds [6 x {}*], [6 x {}*]* %1, i64 0, i64 1 %11 = load atomic {}*, {}** %10 unordered, align 8 %12 = getelementptr inbounds [6 x {}*], [6 x {}*]* %1, i64 0, i64 4 %13 = load atomic {}*, {}** %12 unordered, align 8 %14 = getelementptr inbounds [6 x {}*], [6 x {}*]* %1, i64 0, i64 5 %15 = load atomic {}*, {}** %14 unordered, align 8 %16 = call nonnull {}* inttoptr (i64 140730117045328 to {}* ({}*, i64)*)({}* inttoptr (i64 140727024052800 to {}*), i64 0) %17 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 2 store {}* %16, {}** %17, align 16 store {}* %13, {}** %.sub, align 8 %18 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 1 store {}* %16, {}** %18, align 8 %19 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %20 = bitcast {}* %19 to i64* %21 = getelementptr inbounds i64, i64* %20, i64 -1 %22 = load atomic i64, i64* %21 unordered, align 8 %23 = and i64 %22, -16 %24 = inttoptr i64 %23 to {}* %25 = icmp eq {}* %24, inttoptr (i64 140727024049776 to {}*) br i1 %25, label %pass, label %fail L7: ; preds = %pass store {}* inttoptr (i64 140730211128400 to {}*), {}** %.sub, align 8 %26 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727022899472 to {}*), {}** nonnull %.sub, i32 1) call void @ijl_throw({}* %26) unreachable L10: ; preds = %pass %27 = bitcast {}* %0 to {}** %28 = getelementptr inbounds {}*, {}** %27, i64 4 %29 = bitcast {}** %28 to i64* %30 = load i64, i64* %29, align 8 %31 = getelementptr inbounds {}*, {}** %27, i64 3 %32 = bitcast {}** %31 to i64* %33 = load i64, i64* %32, align 8 %34 = call nonnull {}* inttoptr (i64 140730117045328 to {}* ({}*, i64)*)({}* inttoptr (i64 140726938543280 to {}*), i64 %33) %35 = bitcast {}* %34 to { i8*, i64, i16, i16, i32 }* %36 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %35, i64 0, i32 1 %37 = load i64, i64* %36, align 8 %.not.not = icmp eq i64 %37, 0 br i1 %.not.not, label %L42, label %L25.preheader L25.preheader: ; preds = %L10 %38 = bitcast {}* %34 to i64** %39 = load i64*, i64** %38, align 8 %min.iters.check = icmp ult i64 %37, 16 br i1 %min.iters.check, label %scalar.ph, label %vector.ph vector.ph: ; preds = %L25.preheader %n.vec = and i64 %37, 9223372036854775792 %ind.end = or i64 %n.vec, 1 br label %vector.body vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %40 = getelementptr inbounds i64, i64* %39, i64 %index %41 = bitcast i64* %40 to <4 x i64>* store <4 x i64> , <4 x i64>* %41, align 8 %42 = getelementptr inbounds i64, i64* %40, i64 4 %43 = bitcast i64* %42 to <4 x i64>* store <4 x i64> , <4 x i64>* %43, align 8 %44 = getelementptr inbounds i64, i64* %40, i64 8 %45 = bitcast i64* %44 to <4 x i64>* store <4 x i64> , <4 x i64>* %45, align 8 %46 = getelementptr inbounds i64, i64* %40, i64 12 %47 = bitcast i64* %46 to <4 x i64>* store <4 x i64> , <4 x i64>* %47, align 8 %index.next = add nuw i64 %index, 16 %48 = icmp eq i64 %index.next, %n.vec br i1 %48, label %middle.block, label %vector.body middle.block: ; preds = %vector.body %cmp.n = icmp eq i64 %37, %n.vec br i1 %cmp.n, label %L42, label %scalar.ph scalar.ph: ; preds = %middle.block, %L25.preheader %bc.resume.val = phi i64 [ %ind.end, %middle.block ], [ 1, %L25.preheader ] br label %L25 L25: ; preds = %L25, %scalar.ph %value_phi3 = phi i64 [ %51, %L25 ], [ %bc.resume.val, %scalar.ph ] %49 = add nsw i64 %value_phi3, -1 %50 = getelementptr inbounds i64, i64* %39, i64 %49 store i64 1, i64* %50, align 8 %.not.not50 = icmp eq i64 %value_phi3, %37 %51 = add nuw nsw i64 %value_phi3, 1 br i1 %.not.not50, label %L42, label %L25 L42: ; preds = %L25, %middle.block, %L10 store {}* %34, {}** %17, align 16 store {}* inttoptr (i64 2608217358496 to {}*), {}** %.sub, align 8 store {}* %34, {}** %18, align 8 %52 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 2 store {}* %0, {}** %52, align 8 %53 = call nonnull {}* @ijl_invoke({}* inttoptr (i64 140727018892992 to {}*), {}** nonnull %.sub, i32 3, {}* inttoptr (i64 2613724499568 to {}*)) %54 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 4 store {}* %53, {}** %54, align 16 store {}* %53, {}** %.sub, align 8 %55 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %55, {}** %17, align 16 store {}* %55, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %56 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %57 = bitcast {}* %56 to i64* %58 = getelementptr inbounds i64, i64* %57, i64 -1 %59 = load atomic i64, i64* %58 unordered, align 8 %60 = and i64 %59, -16 %61 = inttoptr i64 %60 to {}* %62 = icmp eq {}* %61, inttoptr (i64 140727024049776 to {}*) br i1 %62, label %pass9, label %fail8 L46: ; preds = %pass9 store {}* %11, {}** %.sub, align 8 %63 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %63, {}** %17, align 16 store {}* %63, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %64 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %65 = bitcast {}* %64 to i64* %66 = getelementptr inbounds i64, i64* %65, i64 -1 %67 = load atomic i64, i64* %66 unordered, align 8 %68 = and i64 %67, -16 %69 = inttoptr i64 %68 to {}* %70 = icmp eq {}* %69, inttoptr (i64 140727024049776 to {}*) br i1 %70, label %pass11, label %fail10 L50: ; preds = %pass11 store {}* %53, {}** %.sub, align 8 store {}* %11, {}** %18, align 8 %71 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 2608219241944 to {}*), {}** nonnull %.sub, i32 2) %72 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 3 store {}* %71, {}** %72, align 8 store {}* %71, {}** %.sub, align 8 %73 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %73, {}** %17, align 16 store {}* %73, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %74 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %75 = bitcast {}* %74 to i64* %76 = getelementptr inbounds i64, i64* %75, i64 -1 %77 = load atomic i64, i64* %76 unordered, align 8 %78 = and i64 %77, -16 %79 = inttoptr i64 %78 to {}* %80 = icmp eq {}* %79, inttoptr (i64 140727024049776 to {}*) br i1 %80, label %pass13, label %fail12 L55: ; preds = %pass13 store {}* %15, {}** %.sub, align 8 %81 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %81, {}** %17, align 16 store {}* %81, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %82 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %83 = bitcast {}* %82 to i64* %84 = getelementptr inbounds i64, i64* %83, i64 -1 %85 = load atomic i64, i64* %84 unordered, align 8 %86 = and i64 %85, -16 %87 = inttoptr i64 %86 to {}* %88 = icmp eq {}* %87, inttoptr (i64 140727024049776 to {}*) br i1 %88, label %pass15, label %fail14 L59: ; preds = %pass15 %ptls_field54 = getelementptr inbounds {}**, {}*** %4, i64 2 %89 = bitcast {}*** %ptls_field54 to i8** %ptls_load5556 = load i8*, i8** %89, align 8 %90 = call noalias nonnull {}* @ijl_gc_pool_alloc(i8* %ptls_load5556, i32 1440, i32 32) #4 %91 = bitcast {}* %90 to i64* %92 = getelementptr inbounds i64, i64* %91, i64 -1 store atomic i64 140727004540896, i64* %92 unordered, align 8 %93 = bitcast {}* %90 to i8* store i64 1, i64* %91, align 8 %.sroa.2.0..sroa_idx = getelementptr inbounds i8, i8* %93, i64 8 %.sroa.2.0..sroa_cast = bitcast i8* %.sroa.2.0..sroa_idx to i64* store i64 %30, i64* %.sroa.2.0..sroa_cast, align 8 store {}* %90, {}** %17, align 16 store {}* %71, {}** %.sub, align 8 store {}* %15, {}** %18, align 8 store {}* %90, {}** %52, align 8 %94 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 2608219243328 to {}*), {}** nonnull %.sub, i32 3) %95 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 6 store {}* %94, {}** %95, align 16 store {}* %94, {}** %.sub, align 8 %96 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %96, {}** %17, align 16 store {}* %96, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %97 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %98 = bitcast {}* %97 to i64* %99 = getelementptr inbounds i64, i64* %98, i64 -1 %100 = load atomic i64, i64* %99 unordered, align 8 %101 = and i64 %100, -16 %102 = inttoptr i64 %101 to {}* %103 = icmp eq {}* %102, inttoptr (i64 140727024049776 to {}*) br i1 %103, label %pass19, label %fail18 L73: ; preds = %pass19 store {}* %13, {}** %.sub, align 8 %104 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %104, {}** %17, align 16 store {}* %104, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %105 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %106 = bitcast {}* %105 to i64* %107 = getelementptr inbounds i64, i64* %106, i64 -1 %108 = load atomic i64, i64* %107 unordered, align 8 %109 = and i64 %108, -16 %110 = inttoptr i64 %109 to {}* %111 = icmp eq {}* %110, inttoptr (i64 140727024049776 to {}*) br i1 %111, label %pass21, label %fail20 L77: ; preds = %pass21 store {}* %53, {}** %.sub, align 8 %112 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726981331024 to {}*), {}** nonnull %.sub, i32 1) store {}* %112, {}** %17, align 16 store {}* %112, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %113 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %114 = bitcast {}* %113 to i64* %115 = getelementptr inbounds i64, i64* %114, i64 -1 %116 = load atomic i64, i64* %115 unordered, align 8 %117 = and i64 %116, -16 %118 = inttoptr i64 %117 to {}* %119 = icmp eq {}* %118, inttoptr (i64 140727024049776 to {}*) br i1 %119, label %pass23, label %fail22 L81: ; preds = %pass23 store {}* %13, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358432 to {}*), {}** %18, align 8 %120 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) store {}* %120, {}** %17, align 16 store {}* %53, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %121 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) %122 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 5 store {}* %121, {}** %122, align 8 store {}* %13, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358432 to {}*), {}** %18, align 8 %123 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) store {}* %123, {}** %72, align 8 store {}* %121, {}** %.sub, align 8 store {}* %123, {}** %18, align 8 %124 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %125 = bitcast {}* %124 to i64* %126 = getelementptr inbounds i64, i64* %125, i64 -1 %127 = load atomic i64, i64* %126 unordered, align 8 %128 = and i64 %127, -16 %129 = inttoptr i64 %128 to {}* %130 = icmp eq {}* %129, inttoptr (i64 140727024049776 to {}*) br i1 %130, label %pass25, label %fail24 L88: ; preds = %pass25 store {}* %94, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %131 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) store {}* %131, {}** %122, align 8 store {}* %13, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358496 to {}*), {}** %18, align 8 %132 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) store {}* %132, {}** %72, align 8 store {}* inttoptr (i64 140726965695184 to {}*), {}** %.sub, align 8 store {}* %132, {}** %18, align 8 store {}* inttoptr (i64 2608217358432 to {}*), {}** %52, align 8 %133 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003751536 to {}*), {}** nonnull %.sub, i32 3) store {}* %133, {}** %72, align 8 store {}* %133, {}** %.sub, align 8 %134 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003179168 to {}*), {}** nonnull %.sub, i32 1) store {}* %134, {}** %72, align 8 store {}* %131, {}** %.sub, align 8 store {}* %134, {}** %18, align 8 %135 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726943686672 to {}*), {}** nonnull %.sub, i32 2) store {}* %135, {}** %72, align 8 store {}* %94, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358432 to {}*), {}** %18, align 8 %136 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) %137 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 7 store {}* %136, {}** %137, align 8 store {}* %53, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358432 to {}*), {}** %18, align 8 %138 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) %139 = getelementptr inbounds [9 x {}*], [9 x {}*]* %gcframe53, i64 0, i64 8 store {}* %138, {}** %139, align 16 store {}* %94, {}** %.sub, align 8 store {}* inttoptr (i64 2608217358432 to {}*), {}** %18, align 8 %140 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140727003289952 to {}*), {}** nonnull %.sub, i32 2) store {}* %140, {}** %122, align 8 store {}* %138, {}** %.sub, align 8 store {}* %140, {}** %18, align 8 %141 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140726939365648 to {}*), {}** nonnull %.sub, i32 2) %142 = bitcast {}* %141 to i64* %143 = getelementptr inbounds i64, i64* %142, i64 -1 %144 = load atomic i64, i64* %143 unordered, align 8 %145 = and i64 %144, -16 %146 = inttoptr i64 %145 to {}* %147 = icmp eq {}* %146, inttoptr (i64 140727024049776 to {}*) br i1 %147, label %pass27, label %fail26 L99: ; preds = %pass27 store {}* %94, {}** %.sub, align 8 store {}* %13, {}** %18, align 8 store {}* %53, {}** %52, align 8 %148 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140728741003264 to {}*), {}** nonnull %.sub, i32 3) store {}* %148, {}** %122, align 8 store {}* %13, {}** %.sub, align 8 store {}* %53, {}** %18, align 8 %149 = call nonnull {}* @jl_f_tuple({}* null, {}** nonnull %.sub, i32 2) store {}* %149, {}** %54, align 16 store {}* %136, {}** %.sub, align 8 store {}* %135, {}** %18, align 8 %150 = call nonnull {}* @jl_f_tuple({}* null, {}** nonnull %.sub, i32 2) store {}* %150, {}** %72, align 8 store {}* %120, {}** %.sub, align 8 %151 = call nonnull {}* @jl_f_tuple({}* null, {}** nonnull %.sub, i32 1) store {}* %151, {}** %17, align 16 store {}* inttoptr (i64 140730211136672 to {}*), {}** %.sub, align 8 store {}* %148, {}** %18, align 8 store {}* %94, {}** %52, align 8 %152 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 3 store {}* %149, {}** %152, align 8 %153 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 4 store {}* %150, {}** %153, align 8 %154 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 5 store {}* %151, {}** %154, align 8 %155 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 6 store {}* inttoptr (i64 140726944076944 to {}*), {}** %155, align 8 %156 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 7 store {}* inttoptr (i64 140727004972512 to {}*), {}** %156, align 8 %157 = getelementptr inbounds [9 x {}*], [9 x {}*]* %2, i64 0, i64 8 store {}* inttoptr (i64 140727022897536 to {}*), {}** %157, align 8 %158 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140728740749904 to {}*), {}** nonnull %.sub, i32 9) %159 = load {}*, {}** %6, align 8 %160 = bitcast {}*** %4 to {}** store {}* %159, {}** %160, align 8 ret {}* %94 L106: ; preds = %pass27 call void @ijl_throw({}* inttoptr (i64 140730211127936 to {}*)) unreachable L108: ; preds = %pass25 call void @ijl_throw({}* inttoptr (i64 140730211127984 to {}*)) unreachable L110: ; preds = %pass23 call void @ijl_throw({}* inttoptr (i64 140730211128336 to {}*)) unreachable L112: ; preds = %pass21 call void @ijl_throw({}* inttoptr (i64 140730211128032 to {}*)) unreachable L114: ; preds = %pass19 call void @ijl_throw({}* inttoptr (i64 140730211128080 to {}*)) unreachable L116: ; preds = %pass15 call void @ijl_throw({}* inttoptr (i64 140730211128192 to {}*)) unreachable L118: ; preds = %pass13 call void @ijl_throw({}* inttoptr (i64 140730211128240 to {}*)) unreachable L120: ; preds = %pass11 call void @ijl_throw({}* inttoptr (i64 140730211128288 to {}*)) unreachable L122: ; preds = %pass9 call void @ijl_throw({}* inttoptr (i64 140730211128336 to {}*)) unreachable fail: ; preds = %top call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %19) unreachable pass: ; preds = %top %161 = icmp eq {}* %19, inttoptr (i64 140727022897552 to {}*) br i1 %161, label %L10, label %L7 fail8: ; preds = %L42 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %56) unreachable pass9: ; preds = %L42 %162 = icmp eq {}* %56, inttoptr (i64 140727022897552 to {}*) br i1 %162, label %L122, label %L46 fail10: ; preds = %L46 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %64) unreachable pass11: ; preds = %L46 %163 = icmp eq {}* %64, inttoptr (i64 140727022897552 to {}*) br i1 %163, label %L120, label %L50 fail12: ; preds = %L50 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %74) unreachable pass13: ; preds = %L50 %164 = icmp eq {}* %74, inttoptr (i64 140727022897552 to {}*) br i1 %164, label %L118, label %L55 fail14: ; preds = %L55 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %82) unreachable pass15: ; preds = %L55 %165 = icmp eq {}* %82, inttoptr (i64 140727022897552 to {}*) br i1 %165, label %L116, label %L59 fail18: ; preds = %L59 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %97) unreachable pass19: ; preds = %L59 %166 = icmp eq {}* %97, inttoptr (i64 140727022897552 to {}*) br i1 %166, label %L114, label %L73 fail20: ; preds = %L73 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %105) unreachable pass21: ; preds = %L73 %167 = icmp eq {}* %105, inttoptr (i64 140727022897552 to {}*) br i1 %167, label %L112, label %L77 fail22: ; preds = %L77 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %113) unreachable pass23: ; preds = %L77 %168 = icmp eq {}* %113, inttoptr (i64 140727022897552 to {}*) br i1 %168, label %L110, label %L81 fail24: ; preds = %L81 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %124) unreachable pass25: ; preds = %L81 %169 = icmp eq {}* %124, inttoptr (i64 140727022897552 to {}*) br i1 %169, label %L108, label %L88 fail26: ; preds = %L88 call void @ijl_type_error(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 140727024049776 to {}*), {}* %141) unreachable pass27: ; preds = %L88 %170 = icmp eq {}* %141, inttoptr (i64 140727022897552 to {}*) br i1 %170, label %L106, label %L99 } ```

@code_native debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_native debuginfo=:none tps_deform(x1, tps) .text .file "tps_deform" .section .rodata.cst8,"aM",@progbits,8 .p2align 3 # -- Begin function julia_tps_deform_1647 .LCPI0_0: .quad 1 # 0x1 .text .globl julia_tps_deform_1647 .p2align 4, 0x90 .type julia_tps_deform_1647,@function julia_tps_deform_1647: # @julia_tps_deform_1647 .cfi_startproc # %bb.0: # %top pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rsi pushq %rdi pushq %rbx andq $-32, %rsp subq $256, %rsp # imm = 0x100 .cfi_offset %rbx, -72 .cfi_offset %rdi, -64 .cfi_offset %rsi, -56 .cfi_offset %r12, -48 .cfi_offset %r13, -40 .cfi_offset %r14, -32 .cfi_offset %r15, -24 movq %rdx, %rdi movq %rcx, %r12 movabsq $140730117045328, %rbx # imm = 0x7FFE48A2BC50 vxorps %xmm0, %xmm0, %xmm0 vmovaps %ymm0, 160(%rsp) vmovaps %ymm0, 128(%rsp) movq $0, 192(%rsp) leaq 180576(%rbx), %rax vzeroupper callq *%rax movq $28, 128(%rsp) movq (%rax), %rcx movq %rcx, 136(%rsp) leaq 128(%rsp), %rcx movq %rax, 224(%rsp) # 8-byte Spill movq %rcx, (%rax) movq 8(%rdi), %r15 movq 32(%rdi), %rsi movq 40(%rdi), %r13 movabsq $140727024052800, %rcx # imm = 0x7FFD90476A40 xorl %edx, %edx callq *%rbx movq %rax, 144(%rsp) movq %rsi, 120(%rsp) # 8-byte Spill movq %rsi, 32(%rsp) movq %rax, 40(%rsp) movabsq $ijl_apply_generic, %rax movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 leaq 32(%rsp), %rdx movl $2, %r8d callq *%rax movq -8(%rax), %rcx shrq $4, %rcx movabsq $8795439003111, %rdx # imm = 0x7FFD90475E7 cmpq %rdx, %rcx jne .LBB0_1 # %bb.20: # %pass movabsq $140727022897552, %rdi # imm = 0x7FFD9035C990 cmpq %rdi, %rax jne .LBB0_21 # %bb.2: # %L10 movq 24(%r12), %rdx movq 32(%r12), %rax movq %rax, 112(%rsp) # 8-byte Spill movabsq $140726938543280, %rcx # imm = 0x7FFD8B2EA4B0 callq *%rbx movq 8(%rax), %r8 testq %r8, %r8 je .LBB0_9 # %bb.3: # %L25.preheader movq (%rax), %rdx movl $1, %ebx cmpq $16, %r8 jb .LBB0_7 # %bb.4: # %vector.ph movq %rdi, %r9 movq %r8, %rcx andq $-16, %rcx leaq 1(%rcx), %rbx xorl %esi, %esi movabsq $.LCPI0_0, %rdi vbroadcastsd (%rdi), %ymm0 .p2align 4, 0x90 .LBB0_5: # %vector.body # =>This Inner Loop Header: Depth=1 vmovups %ymm0, (%rdx,%rsi,8) vmovups %ymm0, 32(%rdx,%rsi,8) vmovups %ymm0, 64(%rdx,%rsi,8) vmovups %ymm0, 96(%rdx,%rsi,8) addq $16, %rsi cmpq %rsi, %rcx jne .LBB0_5 # %bb.6: # %middle.block cmpq %rcx, %r8 movq %r9, %rdi je .LBB0_9 .LBB0_7: # %scalar.ph decq %rbx .p2align 4, 0x90 .LBB0_8: # %L25 # =>This Inner Loop Header: Depth=1 movq $1, (%rdx,%rbx,8) incq %rbx cmpq %rbx, %r8 jne .LBB0_8 .LBB0_9: # %L42 movq %rax, 144(%rsp) movabsq $2608217358496, %r14 # imm = 0x25F45DE80A0 movq %r14, 32(%rsp) movq %rax, 40(%rsp) movq %r12, 48(%rsp) movabsq $ijl_invoke, %rax movabsq $140727018892992, %rcx # imm = 0x7FFD8FF8AEC0 leaq 32(%rsp), %rsi movabsq $2613724499568, %r9 # imm = 0x2608E1ECE70 movq %rsi, %rdx movl $3, %r8d vzeroupper callq *%rax movq %rax, 160(%rsp) movq %rax, 104(%rsp) # 8-byte Spill movq %rax, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 movq %rsi, %rdx movl $1, %r8d movabsq $ijl_apply_generic, %r12 callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx movabsq $8795439003111, %rbx # imm = 0x7FFD90475E7 cmpq %rbx, %rcx jne .LBB0_10 # %bb.22: # %pass9 cmpq %rdi, %rax je .LBB0_19 # %bb.23: # %L46 movq %r15, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 leaq 32(%rsp), %rsi movq %rsi, %rdx movl $1, %r8d callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx cmpq %rbx, %rcx jne .LBB0_40 # %bb.24: # %pass11 cmpq %rdi, %rax je .LBB0_18 # %bb.25: # %L50 movq 104(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) movq %r15, 40(%rsp) movabsq $2608219241944, %rcx # imm = 0x25F45FB3DD8 leaq 32(%rsp), %rdi movq %rdi, %rdx movl $2, %r8d callq *%r12 movq %rax, %rsi movq %rax, 152(%rsp) movq %rax, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 movq %rdi, %rdx movl $1, %r8d callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rdi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx cmpq %rbx, %rcx jne .LBB0_41 # %bb.26: # %pass13 movabsq $140727022897552, %rcx # imm = 0x7FFD9035C990 cmpq %rcx, %rax je .LBB0_17 # %bb.27: # %L55 movq %r13, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 leaq 32(%rsp), %rdi movq %rdi, %rdx movl $1, %r8d callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rdi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx cmpq %rbx, %rcx jne .LBB0_42 # %bb.28: # %pass15 movabsq $140727022897552, %rdi # imm = 0x7FFD9035C990 cmpq %rdi, %rax je .LBB0_16 # %bb.29: # %L59 movq 224(%rsp), %rax # 8-byte Reload movq 16(%rax), %rcx movabsq $ijl_gc_pool_alloc, %rax movl $1440, %edx # imm = 0x5A0 movl $32, %r8d callq *%rax movabsq $140727004540896, %rcx # imm = 0x7FFD8F1DAFE0 movq %rcx, -8(%rax) movq $1, (%rax) movq 112(%rsp), %rcx # 8-byte Reload movq %rcx, 8(%rax) movq %rax, 144(%rsp) movq %rsi, 32(%rsp) movq %r13, 40(%rsp) movq %rax, 48(%rsp) movabsq $2608219243328, %rcx # imm = 0x25F45FB4340 leaq 32(%rsp), %rsi movq %rsi, %rdx movl $3, %r8d callq *%r12 movq %rax, 176(%rsp) movq %rax, 112(%rsp) # 8-byte Spill movq %rax, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 movq %rsi, %rdx movl $1, %r8d callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx cmpq %rbx, %rcx jne .LBB0_43 # %bb.30: # %pass19 cmpq %rdi, %rax je .LBB0_15 # %bb.31: # %L73 movq 120(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 leaq 32(%rsp), %rsi movq %rsi, %rdx movl $1, %r8d callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx cmpq %rbx, %rcx jne .LBB0_44 # %bb.32: # %pass21 cmpq %rdi, %rax je .LBB0_14 # %bb.33: # %L77 movq 104(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) movabsq $140726981331024, %rcx # imm = 0x7FFD8DBB8850 leaq 32(%rsp), %rsi movq %rsi, %rdx movl $1, %r8d callq *%r12 movq %rax, 144(%rsp) movq %rax, 32(%rsp) movq %r14, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx cmpq %rbx, %rcx jne .LBB0_45 # %bb.34: # %pass23 cmpq %rdi, %rax je .LBB0_13 # %bb.35: # %L81 movq 120(%rsp), %rdi # 8-byte Reload movq %rdi, 32(%rsp) movabsq $2608217358432, %r14 # imm = 0x25F45DE8060 movq %r14, 40(%rsp) movabsq $140727003289952, %r15 # imm = 0x7FFD8F0A9960 leaq 32(%rsp), %r13 movq %r15, %rcx movq %r13, %rdx movl $2, %r8d callq *%r12 movq %rax, %rsi movq %rax, 144(%rsp) movq 104(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) movabsq $2608217358496, %rax # imm = 0x25F45DE80A0 movq %rax, 40(%rsp) movq %r15, %rcx movq %r13, %rdx movl $2, %r8d callq *%r12 movq %rax, %rbx movq %rax, 168(%rsp) movq %rdi, 32(%rsp) movq %r14, 40(%rsp) movq %r15, %rcx movq %r13, %rdx movl $2, %r8d callq *%r12 movq %rax, 152(%rsp) movq %rbx, 32(%rsp) movq %rax, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %r13, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx movabsq $8795439003111, %rdx # imm = 0x7FFD90475E7 cmpq %rdx, %rcx jne .LBB0_46 # %bb.36: # %pass25 movq %rsi, 240(%rsp) # 8-byte Spill movabsq $140727022897552, %rcx # imm = 0x7FFD9035C990 cmpq %rcx, %rax je .LBB0_12 # %bb.37: # %L88 movq 112(%rsp), %r13 # 8-byte Reload movq %r13, 32(%rsp) movabsq $2608217358496, %r14 # imm = 0x25F45DE80A0 movq %r14, 40(%rsp) movabsq $140727003289952, %rbx # imm = 0x7FFD8F0A9960 leaq 32(%rsp), %rsi movq %rbx, %rcx movq %rsi, %rdx movl $2, %r8d callq *%r12 movq %rax, %rdi movq %rax, 168(%rsp) movq 120(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) movq %r14, 40(%rsp) movq %rbx, %rcx movq %rbx, %r15 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq %rax, 152(%rsp) movabsq $140726965695184, %rcx # imm = 0x7FFD8CCCF2D0 movq %rcx, 32(%rsp) movq %rax, 40(%rsp) movabsq $2608217358432, %r14 # imm = 0x25F45DE8060 movq %r14, 48(%rsp) movabsq $140727003751536, %rcx # imm = 0x7FFD8F11A470 movq %rsi, %rdx movl $3, %r8d callq *%r12 movq %rax, 152(%rsp) movq %rax, 32(%rsp) movabsq $140727003179168, %rcx # imm = 0x7FFD8F08E8A0 movq %rsi, %rdx movl $1, %r8d callq *%r12 movq %rax, 152(%rsp) movq %rdi, 32(%rsp) movq %rax, 40(%rsp) movabsq $140726943686672, %rcx # imm = 0x7FFD8B7D2010 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq %rax, %rbx movq %rax, 152(%rsp) movq %r13, 32(%rsp) movq %r14, 40(%rsp) movq %r15, %rcx movq %rsi, %rdx movl $2, %r8d callq *%r12 movq %rax, %rdi movq %rax, 184(%rsp) movq 104(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) movq %r14, 40(%rsp) movq %r15, %rcx movq %rsi, %rdx movl $2, %r8d callq *%r12 movq %rax, %r15 movq %rax, 192(%rsp) movq %r13, 32(%rsp) movq %r14, 40(%rsp) movabsq $140727003289952, %rcx # imm = 0x7FFD8F0A9960 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq %rax, 168(%rsp) movq %r15, 32(%rsp) movq %rax, 40(%rsp) movabsq $140726939365648, %rcx # imm = 0x7FFD8B3B3110 movq %rsi, %rdx movl $2, %r8d callq *%r12 movq -8(%rax), %rcx shrq $4, %rcx movabsq $8795439003111, %rdx # imm = 0x7FFD90475E7 cmpq %rdx, %rcx jne .LBB0_47 # %bb.38: # %pass27 movq %rbx, 232(%rsp) # 8-byte Spill movabsq $140727022897552, %rcx # imm = 0x7FFD9035C990 cmpq %rcx, %rax je .LBB0_11 # %bb.39: # %L99 movq %rdi, %r14 movq 112(%rsp), %r12 # 8-byte Reload movq %r12, 32(%rsp) movq 120(%rsp), %r13 # 8-byte Reload movq %r13, 40(%rsp) movq 104(%rsp), %rsi # 8-byte Reload movq %rsi, 48(%rsp) movabsq $140728741003264, %rcx # imm = 0x7FFDF69E0000 leaq 32(%rsp), %r15 movq %r15, %rdx movl $3, %r8d movabsq $ijl_apply_generic, %rax callq *%rax movq %rax, %rbx movq %rax, 168(%rsp) movq %r13, 32(%rsp) movq %rsi, 40(%rsp) movabsq $jl_f_tuple, %r13 xorl %ecx, %ecx movq %r15, %rdx movl $2, %r8d callq *%r13 movq %rax, %rdi movq %rax, 160(%rsp) movq %r14, 32(%rsp) movq 232(%rsp), %rax # 8-byte Reload movq %rax, 40(%rsp) xorl %ecx, %ecx movq %r15, %rdx movl $2, %r8d callq *%r13 movq %rax, %rsi movq %rax, 152(%rsp) movq 240(%rsp), %rax # 8-byte Reload movq %rax, 32(%rsp) xorl %ecx, %ecx movq %r15, %rdx movl $1, %r8d callq *%r13 movq %rax, 144(%rsp) movabsq $140730211136672, %rcx # imm = 0x7FFE4E3E74A0 movq %rcx, 32(%rsp) movq %rbx, 40(%rsp) movq %r12, 48(%rsp) movq %rdi, 56(%rsp) movq %rsi, 64(%rsp) movq %rax, 72(%rsp) movabsq $140726944076944, %rax # imm = 0x7FFD8B831490 movq %rax, 80(%rsp) movabsq $140727004972512, %rax # imm = 0x7FFD8F2445E0 movq %rax, 88(%rsp) movabsq $140727022897536, %rax # imm = 0x7FFD9035C980 movq %rax, 96(%rsp) movabsq $140728740749904, %rcx # imm = 0x7FFDF69A2250 movq %r15, %rdx movl $9, %r8d movabsq $ijl_apply_generic, %rax callq *%rax movq 136(%rsp), %rax movq 224(%rsp), %rcx # 8-byte Reload movq %rax, (%rcx) movq %r12, %rax leaq -56(%rbp), %rsp popq %rbx popq %rdi popq %rsi popq %r12 popq %r13 popq %r14 popq %r15 popq %rbp retq .LBB0_1: # %fail movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_21: # %L7 movabsq $140730211128400, %rax # imm = 0x7FFE4E3E5450 movq %rax, 32(%rsp) movabsq $140727022899472, %rcx # imm = 0x7FFD9035D110 leaq 32(%rsp), %rdx movl $1, %r8d movabsq $ijl_apply_generic, %rax callq *%rax movabsq $ijl_throw, %rdx movq %rax, %rcx callq *%rdx .LBB0_10: # %fail8 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_19: # %L122 movabsq $ijl_throw, %rax movabsq $140730211128336, %rcx # imm = 0x7FFE4E3E5410 callq *%rax .LBB0_40: # %fail10 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_18: # %L120 movabsq $ijl_throw, %rax movabsq $140730211128288, %rcx # imm = 0x7FFE4E3E53E0 callq *%rax .LBB0_41: # %fail12 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_17: # %L118 movabsq $ijl_throw, %rax movabsq $140730211128240, %rcx # imm = 0x7FFE4E3E53B0 callq *%rax .LBB0_42: # %fail14 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_16: # %L116 movabsq $ijl_throw, %rax movabsq $140730211128192, %rcx # imm = 0x7FFE4E3E5380 callq *%rax .LBB0_43: # %fail18 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_15: # %L114 movabsq $ijl_throw, %rax movabsq $140730211128080, %rcx # imm = 0x7FFE4E3E5310 callq *%rax .LBB0_44: # %fail20 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_14: # %L112 movabsq $ijl_throw, %rax movabsq $140730211128032, %rcx # imm = 0x7FFE4E3E52E0 callq *%rax .LBB0_45: # %fail22 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_13: # %L110 movabsq $ijl_throw, %rax movabsq $140730211128336, %rcx # imm = 0x7FFE4E3E5410 callq *%rax .LBB0_46: # %fail24 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_12: # %L108 movabsq $ijl_throw, %rax movabsq $140730211127984, %rcx # imm = 0x7FFE4E3E52B0 callq *%rax .LBB0_47: # %fail26 movabsq $.L_j_str1, %rcx movabsq $ijl_type_error, %rbx movabsq $140727024049776, %rdx # imm = 0x7FFD90475E70 movq %rax, %r8 callq *%rbx .LBB0_11: # %L106 movabsq $ijl_throw, %rax movabsq $140730211127936, %rcx # imm = 0x7FFE4E3E5280 callq *%rax .Lfunc_end0: .size julia_tps_deform_1647, .Lfunc_end0-julia_tps_deform_1647 .cfi_endproc # -- End function .type .L_j_str1,@object # @_j_str1 .section .rodata.str1.1,"aMS",@progbits,1 .L_j_str1: .asciz "if" .size .L_j_str1, 3 .type .L_j_const2,@object # @_j_const2 .section .rodata.cst8,"aM",@progbits,8 .p2align 3 .L_j_const2: .quad 1 # 0x1 .size .L_j_const2, 8 .section ".note.GNU-stack","",@progbits ```

This pull request

@code_lowered debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_lowered debuginfo=:none tps_deform(x1, tps) CodeInfo( 1 ── Core.NewvarNode(:(yt)) │ Core.NewvarNode(:(sumsqr)) │ Core.NewvarNode(:(all_homo_z)) │ Core.NewvarNode(:(D)) │ %5 = Base.getproperty(tps, :x1) │ %6 = Base.getproperty(tps, :d) │ %7 = Base.getproperty(tps, :c) │ x1 = %5 │ d = %6 │ c = %7 │ %11 = d │ %12 = Base.vect() │ %13 = %11 == %12 └─── goto #3 if not %13 2 ── %15 = ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.") │ ThinPlateSplines.throw(%15) └─── goto #3 3 ┄─ D = ThinPlateSplines.size(x2, 2) │ %19 = $(Expr(:static_parameter, 1)) │ %20 = ThinPlateSplines.size(x2, 1) │ %21 = ThinPlateSplines.ones(%19, %20) │ all_homo_z = ThinPlateSplines.hcat(%21, x2) │ Core.NewvarNode(:(𝒜𝒸𝓉!@_11 )) │ %24 = (ndims)(all_homo_z) │ %25 = %24 == 2 └─── goto #5 if not %25 4 ── goto #6 5 ── (throw)("expected a 2-array all_homo_z") 6 ┄─ %29 = (ndims)(x1) │ %30 = %29 == 2 └─── goto #8 if not %30 7 ── goto #9 8 ── (throw)("expected a 2-array x1") 9 ┄─ 𝒜𝒸𝓉!@_11 = %new(ThinPlateSplines.:(var"#𝒜𝒸𝓉!#4" )) │ %35 = 𝒜𝒸𝓉!@_11 │ 𝒜𝒸𝓉!@_13 = %35 │ %37 = ThinPlateSplines.:(var"#ℳ𝒶𝓀ℯ#5" ) │ %38 = Core.typeof(𝒜𝒸𝓉!@_13 ) │ %39 = Core.apply_type(%37, %38) │ ℳ𝒶𝓀ℯ@_12 = %new(%39, 𝒜𝒸𝓉!@_13 ) │ %41 = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_12 , nothing) │ %42 = all_homo_z │ %43 = (%41)(%42, x1) │ sumsqr = %43 │ Core.NewvarNode(:(≪1:D≫)) │ Core.NewvarNode(:(𝒜𝒸𝓉!@_15 )) │ %47 = (ndims)(sumsqr) │ %48 = %47 == 2 └─── goto #11 if not %48 10 ─ goto #12 11 ─ (throw)("expected a 2-array sumsqr") 12 ┄ %52 = (ndims)(c) │ %53 = %52 == 2 └─── goto #14 if not %53 13 ─ goto #15 14 ─ (throw)("expected a 2-array c") 15 ┄ ≪1:D≫ = 1:D │ %58 = ≪1:D≫ isa ThinPlateSplines.AbstractRange └─── goto #17 if not %58 16 ─ goto #18 17 ─ %61 = 1:D │ %62 = ThinPlateSplines.string(%61) │ %63 = "expected a range for (j in 1:D), got " * %62 └─── ThinPlateSplines.throw(%63) 18 ┄ 𝒜𝒸𝓉!@_15 = %new(ThinPlateSplines.:(var"#22#𝒜𝒸𝓉!#7" )) │ %66 = 𝒜𝒸𝓉!@_15 │ 𝒜𝒸𝓉!@_17 = %66 │ %68 = ThinPlateSplines.:(var"#23#ℳ𝒶𝓀ℯ#8" ) │ %69 = Core.typeof(𝒜𝒸𝓉!@_17 ) │ %70 = Core.apply_type(%68, %69) │ ℳ𝒶𝓀ℯ@_16 = %new(%70, 𝒜𝒸𝓉!@_17 ) │ %72 = (Tullio.Eval)(ℳ𝒶𝓀ℯ@_16 , nothing) │ %73 = sumsqr │ %74 = c │ %75 = (%72)(%73, %74, ≪1:D≫) │ yt = %75 │ Core.NewvarNode(:(𝒶𝓍i )) │ Core.NewvarNode(:(𝒶𝓍j )) │ Core.NewvarNode(:(𝒶𝓍l )) │ Core.NewvarNode(:(𝒜𝒸𝓉!@_21 )) │ %81 = (ndims)(yt) │ %82 = %81 == 2 └─── goto #20 if not %82 19 ─ goto #21 20 ─ (throw)("expected a 2-array yt") 21 ┄ %86 = (ndims)(d) │ %87 = %86 == 2 └─── goto #23 if not %87 22 ─ goto #24 23 ─ (throw)("expected a 2-array d") 24 ┄ %91 = (ndims)(all_homo_z) │ %92 = %91 == 2 └─── goto #26 if not %92 25 ─ goto #27 26 ─ (throw)("expected a 2-array all_homo_z") 27 ┄ 𝒜𝒸𝓉!@_21 = %new(ThinPlateSplines.:(var"#26#𝒜𝒸𝓉!#10" )) │ 𝒶𝓍l = (axes)(d, 1) │ %98 = (axes)(all_homo_z, 2) │ %99 = (axes)(d, 1) │ %100 = %98 == %99 └─── goto #29 if not %100 28 ─ goto #30 29 ─ (throw)("range of index l must agree") 30 ┄ %104 = (axes)(yt, 2) │ %105 = ThinPlateSplines.:- │ %106 = (axes)(d, 2) │ %107 = Base.broadcasted(%105, %106, 1) │ %108 = Base.materialize(%107) │ 𝒶𝓍j = ThinPlateSplines.intersect(%104, %108) │ 𝒶𝓍i = (axes)(yt, 1) │ %111 = (axes)(all_homo_z, 1) │ %112 = (axes)(yt, 1) │ %113 = %111 == %112 └─── goto #32 if not %113 31 ─ goto #33 32 ─ (throw)("range of index i must agree") 33 ┄ %117 = 𝒜𝒸𝓉!@_21 │ %118 = (Tullio.storage_type)(yt, d, all_homo_z) │ %119 = yt │ %120 = ThinPlateSplines.tuple(d, all_homo_z) │ %121 = ThinPlateSplines.tuple(𝒶𝓍i , 𝒶𝓍j ) │ %122 = ThinPlateSplines.tuple(𝒶𝓍l ) │ %123 = ThinPlateSplines.:+ │ (Tullio.threader)(%117, %118, %119, %120, %121, %122, %123, 262144, true) │ yt └─── return yt ) ```

@code_typed debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_typed debuginfo=:none tps_deform(x1, tps) CodeInfo( 1 ── %1 = Base.getfield(tps, :x1)::Matrix{Int64} │ %2 = Base.getfield(tps, :d)::Matrix{Float64} │ %3 = Base.getfield(tps, :c)::Matrix{Float64} │ %4 = Base.arraysize(x2, 2)::Int64 │ %5 = Base.arraysize(x2, 1)::Int64 │ %6 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, :(%5), :(%5)))::Vector{Int64} │ %7 = Base.arraysize(%6, 1)::Int64 │ %8 = Base.slt_int(%7, 0)::Bool │ %9 = Core.ifelse(%8, 0, %7)::Int64 │ %10 = Base.slt_int(%9, 1)::Bool └─── goto #3 if not %10 2 ── goto #4 3 ── goto #4 4 ┄─ %14 = φ (#2 => true, #3 => false)::Bool │ %15 = φ (#3 => 1)::Int64 │ %16 = φ (#3 => 1)::Int64 │ %17 = Base.not_int(%14)::Bool └─── goto #10 if not %17 5 ┄─ %19 = φ (#4 => %15, #9 => %27)::Int64 │ %20 = φ (#4 => %16, #9 => %28)::Int64 │ Base.arrayset(false, %6, 1, %19)::Vector{Int64} │ %22 = (%20 === %9)::Bool └─── goto #7 if not %22 6 ── goto #8 7 ── %25 = Base.add_int(%20, 1)::Int64 └─── goto #8 8 ┄─ %27 = φ (#7 => %25)::Int64 │ %28 = φ (#7 => %25)::Int64 │ %29 = φ (#6 => true, #7 => false)::Bool │ %30 = Base.not_int(%29)::Bool └─── goto #10 if not %30 9 ── goto #5 10 ┄ goto #11 11 ─ goto #12 12 ─ goto #13 13 ─ %36 = Core.tuple(%6, x2)::Tuple{Vector{Int64}, Matrix{Int64}} │ %37 = invoke Base._typed_hcat(Int64::Type{Int64}, %36::Tuple{Vector{Int64}, Matrix{Int64}})::Matrix{Int64} │ nothing::Nothing │ %39 = invoke $(QuoteNode(Tullio.Eval{ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5" {ThinPlateSplines.var"#𝒜𝒸𝓉!#4" }, Nothing}(ThinPlateSplines.var"#ℳ𝒶𝓀ℯ#5" {ThinPlateSplines.var"#𝒜𝒸𝓉!#4" }(ThinPlateSplines.var"#𝒜𝒸𝓉!#4" ()), nothing)))(%37::Matrix{Int64}, %1::Vararg{Matrix{Int64}})::Matrix{Int64} │ nothing::Nothing │ %41 = Base.sle_int(1, %4)::Bool └─── goto #15 if not %41 14 ─ goto #16 15 ─ goto #16 16 ┄ %45 = φ (#14 => %4, #15 => 0)::Int64 │ %46 = %new(UnitRange{Int64}, 1, %45)::UnitRange{Int64} └─── goto #17 17 ─ goto #18 18 ─ nothing::Nothing │ %50 = invoke $(QuoteNode(Tullio.Eval{ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8" {ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7" }, Nothing}(ThinPlateSplines.var"#23#ℳ𝒶𝓀ℯ#8 "{ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7" }(ThinPlateSplines.var"#22#𝒜𝒸𝓉!#7" ()), nothing)))(%39::Matrix{Int64}, %3::Vararg{Any}, %46)::Matrix{Float64} │ nothing::Nothing │ nothing::Nothing │ %53 = %new(ThinPlateSplines.:(var"#26#𝒜𝒸𝓉!#10" ))::ThinPlateSplines.var"#26#𝒜𝒸𝓉!#10" │ %54 = Base.arraysize(%2, 1)::Int64 │ %55 = Base.slt_int(%54, 0)::Bool │ %56 = Core.ifelse(%55, 0, %54)::Int64 │ %57 = Base.arraysize(%37, 2)::Int64 │ %58 = Base.slt_int(%57, 0)::Bool │ %59 = Core.ifelse(%58, 0, %57)::Int64 │ %60 = Base.arraysize(%2, 1)::Int64 │ %61 = Base.slt_int(%60, 0)::Bool │ %62 = Core.ifelse(%61, 0, %60)::Int64 │ %63 = (%59 === %62)::Bool └─── goto #64 if not %63 19 ─ nothing::Nothing │ %66 = Base.arraysize(%50, 2)::Int64 │ %67 = Base.slt_int(%66, 0)::Bool │ %68 = Core.ifelse(%67, 0, %66)::Int64 │ %69 = Base.arraysize(%2, 2)::Int64 │ %70 = Base.slt_int(%69, 0)::Bool │ %71 = Core.ifelse(%70, 0, %69)::Int64 │ %72 = Base.sub_int(%71, 1)::Int64 │ %73 = Base.sle_int(0, %72)::Bool └─── goto #21 if not %73 20 ─ goto #22 21 ─ goto #22 22 ┄ %77 = φ (#20 => %72, #21 => -1)::Int64 └─── goto #23 23 ─ goto #24 24 ─ goto #25 25 ─ goto #26 26 ─ goto #27 27 ─ goto #28 28 ─ goto #29 29 ─ goto #30 30 ─ %86 = Base.slt_int(%77, %68)::Bool │ %87 = Core.ifelse(%86, %77, %68)::Int64 │ %88 = Base.sle_int(1, %87)::Bool └─── goto #32 if not %88 31 ─ goto #33 32 ─ goto #33 33 ┄ %92 = φ (#31 => %87, #32 => 0)::Int64 └─── goto #34 34 ─ goto #35 35 ─ goto #36 36 ─ %96 = Base.arraysize(%50, 1)::Int64 │ %97 = Base.slt_int(%96, 0)::Bool │ %98 = Core.ifelse(%97, 0, %96)::Int64 │ %99 = Base.arraysize(%37, 1)::Int64 │ %100 = Base.slt_int(%99, 0)::Bool │ %101 = Core.ifelse(%100, 0, %99)::Int64 │ %102 = Base.arraysize(%50, 1)::Int64 │ %103 = Base.slt_int(%102, 0)::Bool │ %104 = Core.ifelse(%103, 0, %102)::Int64 │ %105 = (%101 === %104)::Bool └─── goto #63 if not %105 37 ─ nothing::Nothing │ %108 = Base.sle_int(1, %98)::Bool └─── goto #39 if not %108 38 ─ goto #40 39 ─ goto #40 40 ┄ %112 = φ (#38 => %98, #39 => 0)::Int64 │ %113 = %new(UnitRange{Int64}, 1, %112)::UnitRange{Int64} └─── goto #41 41 ─ goto #42 42 ─ goto #43 43 ─ %117 = Base.sle_int(1, %92)::Bool └─── goto #45 if not %117 44 ─ goto #46 45 ─ %120 = Base.sub_int(1, 1)::Int64 └─── goto #46 46 ┄ %122 = φ (#44 => %92, #45 => %120)::Int64 │ %123 = %new(UnitRange{Int64}, 1, %122)::UnitRange{Int64} └─── goto #47 47 ─ goto #48 48 ─ goto #49 49 ─ %127 = Core.tuple(%113, %123)::Tuple{UnitRange{Int64}, UnitRange{Int64}} └─── goto #50 50 ─ %129 = Base.sle_int(1, %56)::Bool └─── goto #52 if not %129 51 ─ goto #53 52 ─ goto #53 53 ┄ %133 = φ (#51 => %56, #52 => 0)::Int64 │ %134 = %new(UnitRange{Int64}, 1, %133)::UnitRange{Int64} └─── goto #54 54 ─ goto #55 55 ─ goto #56 56 ─ %138 = Core.tuple(%134)::Tuple{UnitRange{Int64}} └─── goto #57 57 ─ %140 = Base.sub_int(%112, 1)::Int64 │ %141 = Base.add_int(1, %140)::Int64 │ %142 = Base.sub_int(%122, 1)::Int64 │ %143 = Base.add_int(1, %142)::Int64 │ %144 = Base.mul_int(%141, %143)::Int64 │ %145 = Base.sub_int(%133, 1)::Int64 │ %146 = Base.add_int(1, %145)::Int64 │ %147 = Base.Threads.cglobal(:jl_n_threads_per_pool, Ptr{Int32})::Ptr{Ptr{Int32}} │ %148 = Base.pointerref(%147, 1, 1)::Ptr{Int32} │ %149 = Base.pointerref(%148, 2, 1)::Int32 │ %150 = Core.sext_int(Core.Int64, %149)::Int64 │ %151 = Base.mul_int(%144, %146)::Int64 │ %152 = Base.checked_sdiv_int(%151, 262144)::Int64 │ %153 = Base.slt_int(0, %151)::Bool │ %154 = (%153 === true)::Bool │ %155 = Base.mul_int(%152, 262144)::Int64 │ %156 = (%155 === %151)::Bool │ %157 = Base.not_int(%156)::Bool │ %158 = Base.and_int(%154, %157)::Bool │ %159 = Core.zext_int(Core.Int64, %158)::Int64 │ %160 = Core.and_int(%159, 1)::Int64 │ %161 = Base.add_int(%152, %160)::Int64 │ %162 = Base.slt_int(%161, %150)::Bool │ %163 = Core.ifelse(%162, %161, %150)::Int64 │ %164 = Base.slt_int(%144, %163)::Bool │ %165 = Core.ifelse(%164, %144, %163)::Int64 └─── goto #60 if not true 58 ─ %167 = Base.slt_int(1, %165)::Bool └─── goto #60 if not %167 59 ─ %169 = Core.tuple(%50, %2, %37)::Tuple{Matrix{Float64}, Matrix{Float64}, Matrix{Int64}} │ invoke Tullio.thread_halves(%53::ThinPlateSplines.var"#26#𝒜𝒸𝓉!#10" , Matrix{Float64}::Type{Matrix{Float64}}, %169::Tuple{Matrix{Float64}, Matrix{Float64}, Matrix{Int64}}, %127::Tuple{UnitRange{Int64}, UnitRange{Int64}}, %138::Tuple{UnitRange{Int64}}, %165::Int64, true::Bool)::Any └─── goto #61 60 ┄ %172 = Core.tuple(%50, %2, %37)::Tuple{Matrix{Float64}, Matrix{Float64}, Matrix{Int64}} │ %173 = Tullio.tile_halves::typeof(Tullio.tile_halves) └─── invoke %173(%53::ThinPlateSplines.var"#26#𝒜𝒸𝓉!#10" , Matrix{Float64}::Type{Matrix{Float64}}, %172::Tuple{Matrix{Float64}, Matrix{Float64}, Matrix{Int64}}, %127::Tuple{UnitRange{Int64}, UnitRange{Int64}}, %138::Tuple{UnitRange{Int64}}, true::Bool, true::Bool)::Nothing 61 ┄ goto #62 62 ─ return %50 63 ─ (throw)("range of index i must agree")::Union{} └─── unreachable 64 ─ (throw)("range of index l must agree")::Union{} └─── unreachable ) => Matrix{Float64} ```

@code_llvm debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_llvm debuginfo=:none tps_deform(x1, tps) ; Function Attrs: uwtable define nonnull {}* @julia_tps_deform_1738({}* noundef nonnull align 16 dereferenceable(40) %0, { double, {}*, {}*, {}*, {}*, {}* }* nocapture noundef nonnull readonly align 8 dereferenceable(48) %1) #0 { top: %2 = alloca [3 x {}*], align 8 %gcframe52 = alloca [13 x {}*], align 16 %gcframe52.sub = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 0 %.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %2, i64 0, i64 0 %3 = bitcast [13 x {}*]* %gcframe52 to i8* call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(104) %3, i8 0, i32 104, i1 false) %4 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 7 %5 = bitcast {}** %4 to [3 x {}*]* %6 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 4 %7 = bitcast {}** %6 to [3 x {}*]* %8 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 2 %9 = bitcast {}** %8 to [2 x {}*]* %10 = alloca [2 x [2 x i64]], align 8 %11 = alloca [1 x [2 x i64]], align 8 %12 = call {}*** inttoptr (i64 140730117225904 to {}*** ()*)() #8 %13 = bitcast [13 x {}*]* %gcframe52 to i64* store i64 44, i64* %13, align 16 %14 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 1 %15 = bitcast {}** %14 to {}*** %16 = load {}**, {}*** %12, align 8 store {}** %16, {}*** %15, align 8 %17 = bitcast {}*** %12 to {}*** store {}** %gcframe52.sub, {}*** %17, align 8 %18 = getelementptr inbounds { double, {}*, {}*, {}*, {}*, {}* }, { double, {}*, {}*, {}*, {}*, {}* }* %1, i64 0, i32 1 %19 = load atomic {}*, {}** %18 unordered, align 8 %20 = getelementptr inbounds { double, {}*, {}*, {}*, {}*, {}* }, { double, {}*, {}*, {}*, {}*, {}* }* %1, i64 0, i32 4 %21 = load atomic {}*, {}** %20 unordered, align 8 %22 = getelementptr inbounds { double, {}*, {}*, {}*, {}*, {}* }, { double, {}*, {}*, {}*, {}*, {}* }* %1, i64 0, i32 5 %23 = load atomic {}*, {}** %22 unordered, align 8 %24 = bitcast {}* %0 to {}** %25 = getelementptr inbounds {}*, {}** %24, i64 4 %26 = bitcast {}** %25 to i64* %27 = load i64, i64* %26, align 8 %28 = getelementptr inbounds {}*, {}** %24, i64 3 %29 = bitcast {}** %28 to i64* %30 = load i64, i64* %29, align 8 %31 = call nonnull {}* inttoptr (i64 140730117045328 to {}* ({}*, i64)*)({}* inttoptr (i64 140726938543280 to {}*), i64 %30) %32 = bitcast {}* %31 to { i8*, i64, i16, i16, i32 }* %33 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %32, i64 0, i32 1 %34 = load i64, i64* %33, align 8 %.not.not = icmp eq i64 %34, 0 br i1 %.not.not, label %L36, label %L19.preheader L19.preheader: ; preds = %top %35 = bitcast {}* %31 to i64** %36 = load i64*, i64** %35, align 8 %min.iters.check = icmp ult i64 %34, 16 br i1 %min.iters.check, label %scalar.ph, label %vector.ph vector.ph: ; preds = %L19.preheader %n.vec = and i64 %34, 9223372036854775792 %ind.end = or i64 %n.vec, 1 br label %vector.body vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %37 = getelementptr inbounds i64, i64* %36, i64 %index %38 = bitcast i64* %37 to <4 x i64>* store <4 x i64> , <4 x i64>* %38, align 8 %39 = getelementptr inbounds i64, i64* %37, i64 4 %40 = bitcast i64* %39 to <4 x i64>* store <4 x i64> , <4 x i64>* %40, align 8 %41 = getelementptr inbounds i64, i64* %37, i64 8 %42 = bitcast i64* %41 to <4 x i64>* store <4 x i64> , <4 x i64>* %42, align 8 %43 = getelementptr inbounds i64, i64* %37, i64 12 %44 = bitcast i64* %43 to <4 x i64>* store <4 x i64> , <4 x i64>* %44, align 8 %index.next = add nuw i64 %index, 16 %45 = icmp eq i64 %index.next, %n.vec br i1 %45, label %middle.block, label %vector.body middle.block: ; preds = %vector.body %cmp.n = icmp eq i64 %34, %n.vec br i1 %cmp.n, label %L36, label %scalar.ph scalar.ph: ; preds = %middle.block, %L19.preheader %bc.resume.val = phi i64 [ %ind.end, %middle.block ], [ 1, %L19.preheader ] br label %L19 L19: ; preds = %L19, %scalar.ph %value_phi3 = phi i64 [ %48, %L19 ], [ %bc.resume.val, %scalar.ph ] %46 = add nsw i64 %value_phi3, -1 %47 = getelementptr inbounds i64, i64* %36, i64 %46 store i64 1, i64* %47, align 8 %.not.not42 = icmp eq i64 %value_phi3, %34 %48 = add nuw nsw i64 %value_phi3, 1 br i1 %.not.not42, label %L36, label %L19 L36: ; preds = %L19, %middle.block, %top store {}* %31, {}** %8, align 16 %.fca.1.gep39 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 3 store {}* %0, {}** %.fca.1.gep39, align 8 %49 = call nonnull {}* @j__typed_hcat_1740([2 x {}*]* nocapture readonly %9) #0 %50 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 12 store {}* %49, {}** %50, align 16 store {}* %49, {}** %.sub, align 8 %51 = getelementptr inbounds [3 x {}*], [3 x {}*]* %2, i64 0, i64 1 store {}* %19, {}** %51, align 8 %52 = call nonnull {}* @j1_Eval_1741({}* inttoptr (i64 2598230379368 to {}*), {}** nonnull %.sub, i32 2) %53 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 11 store {}* %52, {}** %53, align 8 %ptls_field53 = getelementptr inbounds {}**, {}*** %12, i64 2 %54 = bitcast {}*** %ptls_field53 to i8** %ptls_load5455 = load i8*, i8** %54, align 8 %55 = call noalias nonnull {}* @ijl_gc_pool_alloc(i8* %ptls_load5455, i32 1440, i32 32) #1 %56 = bitcast {}* %55 to i64* %57 = getelementptr inbounds i64, i64* %56, i64 -1 store atomic i64 140727004540896, i64* %57 unordered, align 8 %58 = bitcast {}* %55 to i8* store i64 1, i64* %56, align 8 %.sroa.235.0..sroa_idx = getelementptr inbounds i8, i8* %58, i64 8 %.sroa.235.0..sroa_cast = bitcast i8* %.sroa.235.0..sroa_idx to i64* store i64 %27, i64* %.sroa.235.0..sroa_cast, align 8 %59 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 10 store {}* %55, {}** %59, align 16 store {}* %52, {}** %.sub, align 8 store {}* %23, {}** %51, align 8 %60 = getelementptr inbounds [3 x {}*], [3 x {}*]* %2, i64 0, i64 2 store {}* %55, {}** %60, align 8 %61 = call nonnull {}* @j1_Eval_1742({}* inttoptr (i64 2598230382704 to {}*), {}** nonnull %.sub, i32 3) %62 = bitcast {}* %21 to {}** %63 = getelementptr inbounds {}*, {}** %62, i64 3 %64 = bitcast {}** %63 to i64* %65 = load i64, i64* %64, align 8 %66 = bitcast {}* %49 to {}** %67 = getelementptr inbounds {}*, {}** %66, i64 4 %68 = bitcast {}** %67 to i64* %69 = load i64, i64* %68, align 8 %.not = icmp eq i64 %69, %65 br i1 %.not, label %L65, label %L179 L65: ; preds = %L36 %70 = bitcast {}* %61 to {}** %71 = getelementptr inbounds {}*, {}** %70, i64 4 %72 = bitcast {}** %71 to i64* %73 = load i64, i64* %72, align 8 %74 = getelementptr inbounds {}*, {}** %62, i64 4 %75 = bitcast {}** %74 to i64* %76 = load i64, i64* %75, align 8 %77 = add nsw i64 %76, -1 %.not45 = icmp ugt i64 %76, %73 %78 = select i1 %.not45, i64 %73, i64 %77 %79 = getelementptr inbounds {}*, {}** %70, i64 3 %80 = bitcast {}** %79 to i64* %81 = load i64, i64* %80, align 8 %82 = getelementptr inbounds {}*, {}** %66, i64 3 %83 = bitcast {}** %82 to i64* %84 = load i64, i64* %83, align 8 %.not46 = icmp eq i64 %84, %81 br i1 %.not46, label %L107, label %L177 L107: ; preds = %L65 %.inv = icmp sgt i64 %78, 0 %value_phi13 = select i1 %.inv, i64 %78, i64 0 %.sroa.031.0..sroa_idx = getelementptr inbounds [2 x [2 x i64]], [2 x [2 x i64]]* %10, i64 0, i64 0, i64 0 store i64 1, i64* %.sroa.031.0..sroa_idx, align 8 %.sroa.232.0..sroa_idx33 = getelementptr inbounds [2 x [2 x i64]], [2 x [2 x i64]]* %10, i64 0, i64 0, i64 1 store i64 %81, i64* %.sroa.232.0..sroa_idx33, align 8 %.sroa.028.0..sroa_idx = getelementptr inbounds [2 x [2 x i64]], [2 x [2 x i64]]* %10, i64 0, i64 1, i64 0 store i64 1, i64* %.sroa.028.0..sroa_idx, align 8 %.sroa.229.0..sroa_idx30 = getelementptr inbounds [2 x [2 x i64]], [2 x [2 x i64]]* %10, i64 0, i64 1, i64 1 store i64 %value_phi13, i64* %.sroa.229.0..sroa_idx30, align 8 %.sroa.0.0..sroa_idx = getelementptr inbounds [1 x [2 x i64]], [1 x [2 x i64]]* %11, i64 0, i64 0, i64 0 store i64 1, i64* %.sroa.0.0..sroa_idx, align 8 %.sroa.2.0..sroa_idx27 = getelementptr inbounds [1 x [2 x i64]], [1 x [2 x i64]]* %11, i64 0, i64 0, i64 1 store i64 %65, i64* %.sroa.2.0..sroa_idx27, align 8 %85 = mul i64 %value_phi13, %81 %86 = load i64, i64* inttoptr (i64 140730749957576 to i64*), align 8 %87 = inttoptr i64 %86 to i32* %88 = getelementptr inbounds i32, i32* %87, i64 1 %89 = load i32, i32* %88, align 1 %90 = sext i32 %89 to i64 %91 = mul i64 %85, %65 %92 = sdiv i64 %91, 262144 %93 = icmp sgt i64 %91, 0 %94 = shl nsw i64 %92, 18 %95 = icmp ne i64 %94, %91 %96 = and i1 %93, %95 %97 = zext i1 %96 to i64 %98 = add nsw i64 %92, %97 %.not49 = icmp slt i64 %98, %90 %99 = select i1 %.not49, i64 %98, i64 %90 %.not50 = icmp slt i64 %85, %99 %100 = select i1 %.not50, i64 %85, i64 %99 %101 = icmp slt i64 %100, 2 br i1 %101, label %L172, label %L169 L169: ; preds = %L107 store {}* %61, {}** %4, align 8 %.fca.1.gep24 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 8 store {}* %21, {}** %.fca.1.gep24, align 16 %.fca.2.gep26 = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 9 store {}* %49, {}** %.fca.2.gep26, align 8 store {}* %61, {}** %59, align 16 call void @j_thread_halves_1743([3 x {}*]* nocapture readonly %5, [2 x [2 x i64]]* nocapture readonly %10, [1 x [2 x i64]]* nocapture readonly %11, i64 signext %100, i8 zeroext 1) #0 br label %L176 L172: ; preds = %L107 store {}* %61, {}** %6, align 16 %.fca.1.gep = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 5 store {}* %21, {}** %.fca.1.gep, align 8 %.fca.2.gep = getelementptr inbounds [13 x {}*], [13 x {}*]* %gcframe52, i64 0, i64 6 store {}* %49, {}** %.fca.2.gep, align 16 store {}* %61, {}** %59, align 16 call void @j_tile_halves_1744([3 x {}*]* nocapture readonly %7, [2 x [2 x i64]]* nocapture readonly %10, [1 x [2 x i64]]* nocapture readonly %11, i8 zeroext 1, i8 zeroext 1) #0 br label %L176 L176: ; preds = %L172, %L169 %102 = load {}*, {}** %14, align 8 %103 = bitcast {}*** %12 to {}** store {}* %102, {}** %103, align 8 ret {}* %61 L177: ; preds = %L65 call void @ijl_throw({}* inttoptr (i64 140730211165408 to {}*)) unreachable L179: ; preds = %L36 call void @ijl_throw({}* inttoptr (i64 140730211165456 to {}*)) unreachable } ```

@code_native debuginfo=:none tps_deform(x1, tps)

```julia julia> @code_native debuginfo=:none tps_deform(x1, tps) .text .file "tps_deform" .section .rodata.cst8,"aM",@progbits,8 .p2align 3 # -- Begin function julia_tps_deform_1745 .LCPI0_0: .quad 1 # 0x1 .text .globl julia_tps_deform_1745 .p2align 4, 0x90 .type julia_tps_deform_1745,@function julia_tps_deform_1745: # @julia_tps_deform_1745 .cfi_startproc # %bb.0: # %top pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rsi pushq %rdi pushq %rbx andq $-32, %rsp subq $256, %rsp # imm = 0x100 .cfi_offset %rbx, -72 .cfi_offset %rdi, -64 .cfi_offset %rsi, -56 .cfi_offset %r12, -48 .cfi_offset %r13, -40 .cfi_offset %r14, -32 .cfi_offset %r15, -24 movq %rdx, %rbx movq %rcx, %r13 vxorps %xmm0, %xmm0, %xmm0 vmovaps %ymm0, 160(%rsp) vmovaps %ymm0, 128(%rsp) movabsq $140730117045328, %rsi # imm = 0x7FFE48A2BC50 vmovaps %ymm0, 96(%rsp) movq $0, 192(%rsp) leaq 180576(%rsi), %rax vzeroupper callq *%rax movq %rax, %r15 movq $44, 96(%rsp) movq (%rax), %rax movq %rax, 104(%rsp) leaq 96(%rsp), %rax movq %rax, (%r15) movq 8(%rbx), %r14 movq 32(%rbx), %r12 movq 40(%rbx), %rdi movq 24(%r13), %rdx movq 32(%r13), %rax movq %rax, 48(%rsp) # 8-byte Spill movabsq $140726938543280, %rcx # imm = 0x7FFD8B2EA4B0 callq *%rsi movq 8(%rax), %r8 testq %r8, %r8 je .LBB0_7 # %bb.1: # %L19.preheader movq (%rax), %rdx movl $1, %ebx cmpq $16, %r8 jb .LBB0_5 # %bb.2: # %vector.ph movq %rdi, %r9 movq %r8, %rcx andq $-16, %rcx leaq 1(%rcx), %rbx xorl %edi, %edi movabsq $.LCPI0_0, %rsi vbroadcastsd (%rsi), %ymm0 .p2align 4, 0x90 .LBB0_3: # %vector.body # =>This Inner Loop Header: Depth=1 vmovups %ymm0, (%rdx,%rdi,8) vmovups %ymm0, 32(%rdx,%rdi,8) vmovups %ymm0, 64(%rdx,%rdi,8) vmovups %ymm0, 96(%rdx,%rdi,8) addq $16, %rdi cmpq %rdi, %rcx jne .LBB0_3 # %bb.4: # %middle.block cmpq %rcx, %r8 movq %r9, %rdi je .LBB0_7 .LBB0_5: # %scalar.ph decq %rbx .p2align 4, 0x90 .LBB0_6: # %L19 # =>This Inner Loop Header: Depth=1 movq $1, (%rdx,%rbx,8) incq %rbx cmpq %rbx, %r8 jne .LBB0_6 .LBB0_7: # %L36 movq %rax, 112(%rsp) movq %r13, 120(%rsp) movabsq $j__typed_hcat_1747, %rax leaq 112(%rsp), %rcx vzeroupper callq *%rax movq %rax, %r13 movq %rax, 192(%rsp) movq %rax, 72(%rsp) movq %r14, 80(%rsp) movabsq $j1_Eval_1748, %rax movabsq $2598230379368, %rcx # imm = 0x25CF2994B68 leaq 72(%rsp), %rbx movq %rbx, %rdx movl $2, %r8d callq *%rax movq %rax, %rsi movq %rax, 184(%rsp) movq 16(%r15), %rcx movabsq $ijl_gc_pool_alloc, %rax movl $1440, %edx # imm = 0x5A0 movl $32, %r8d callq *%rax movabsq $140727004540896, %rcx # imm = 0x7FFD8F1DAFE0 movq %rcx, -8(%rax) movq $1, (%rax) movq 48(%rsp), %rcx # 8-byte Reload movq %rcx, 8(%rax) movq %rax, 176(%rsp) movq %rsi, 72(%rsp) movq %rdi, 80(%rsp) movq %rax, 88(%rsp) movabsq $j1_Eval_1749, %rax movabsq $2598230382704, %rcx # imm = 0x25CF2995870 movq %rbx, %rdx movl $3, %r8d callq *%rax movq %rax, %r14 movq 24(%r12), %rax cmpq %rax, 32(%r13) jne .LBB0_14 # %bb.8: # %L65 movq 24(%r14), %rdx movq 32(%r14), %rsi movq 32(%r12), %rdi leaq -1(%rdi), %rbx cmpq %rsi, %rdi cmovaq %rsi, %rbx cmpq %rdx, 24(%r13) jne .LBB0_13 # %bb.9: # %L107 movq %rbx, %rcx sarq $63, %rcx andnq %rbx, %rcx, %rcx movq $1, 216(%rsp) movq %rdx, 224(%rsp) movq $1, 232(%rsp) movq %rcx, 240(%rsp) movq $1, 56(%rsp) movq %rax, 64(%rsp) imulq %rdx, %rcx movabsq $140730117045328, %rdx # imm = 0x7FFE48A2BC50 movq 632912248(%rdx), %rdx movslq 4(%rdx), %rdx imulq %rcx, %rax leaq 262143(%rax), %rbx testq %rax, %rax cmovnsq %rax, %rbx setg %r8b movq %rbx, %rdi sarq $18, %rdi andq $-262144, %rbx # imm = 0xFFFC0000 cmpq %rax, %rbx setne %al andb %r8b, %al movzbl %al, %r9d addq %rdi, %r9 cmpq %rdx, %r9 cmovgeq %rdx, %r9 cmpq %r9, %rcx cmovlq %rcx, %r9 cmpq $2, %r9 jge .LBB0_10 # %bb.11: # %L172 leaq 152(%rsp), %rcx movq %r14, 152(%rsp) movq %r12, 160(%rsp) movq %r13, 168(%rsp) movq %r14, 176(%rsp) movb $1, 32(%rsp) movabsq $j_tile_halves_1751, %rax leaq 216(%rsp), %rdx leaq 56(%rsp), %r8 movb $1, %r9b callq *%rax jmp .LBB0_12 .LBB0_10: # %L169 leaq 128(%rsp), %rcx movq %r14, 128(%rsp) movq %r12, 136(%rsp) movq %r13, 144(%rsp) movq %r14, 176(%rsp) movb $1, 32(%rsp) movabsq $j_thread_halves_1750, %rax leaq 216(%rsp), %rdx leaq 56(%rsp), %r8 callq *%rax .LBB0_12: # %L176 movq 104(%rsp), %rax movq %rax, (%r15) movq %r14, %rax leaq -56(%rbp), %rsp popq %rbx popq %rdi popq %rsi popq %r12 popq %r13 popq %r14 popq %r15 popq %rbp retq .LBB0_14: # %L179 movabsq $ijl_throw, %rax movabsq $140730211165456, %rcx # imm = 0x7FFE4E3EE510 callq *%rax .LBB0_13: # %L177 movabsq $ijl_throw, %rax movabsq $140730211165408, %rcx # imm = 0x7FFE4E3EE4E0 callq *%rax .Lfunc_end0: .size julia_tps_deform_1745, .Lfunc_end0-julia_tps_deform_1745 .cfi_endproc # -- End function .type .L_j_const1,@object # @_j_const1 .section .rodata.cst8,"aM",@progbits,8 .p2align 3 .L_j_const1: .quad 1 # 0x1 .size .L_j_const1, 8 .section ".note.GNU-stack","",@progbits ```

mkitti commented 11 months ago

For compiled code, I think the LLVM IR is the most readable. Notice how with this pull request, we see more references to double and i64 while the code is shorter. You do not see any refrences to double (Float64) in the current master code.

oxinabox commented 11 months ago

@mkitti is correct that that blog post I wrote was primarily talking about typed constraints on methods. Type parameters on struct fields are mostly good and are required to avoid field access being type unstable.

There is a corner case where you are trying to minimize compile time where you don't want this (see DataFrames.jl). But that doesn't apply to this case. And when it does apply care needs to be taken with how the code is written (eg function barriers) to make sure you hit the runtime vs compile-time performance trade off desired.

mkitti commented 11 months ago

The simplest design for the ThinPlateSpline struct would be make everything a Float64. Then no parameters are needed. This might break some code that starts with integers though.

anj1 commented 11 months ago

Thanks for the benchmarks. Your results are interesting as I'm not getting any discernible difference.

This PR:

julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.895 μs … 166.285 μs  ┊ GC (min … max): 0.00% … 97.21%
 Time  (median):     2.072 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.286 μs ±   1.707 μs  ┊ GC (mean ± σ):  0.71% ±  0.97%

   ▁▆█▆▄▄▃▁▃▄▃▂▃▃▂▂▂▂                                         ▁
  ▇█████████████████████▇▆▆▅▅▆▅▆▄▅▆▄▄▅▅▄▅▅▆▆▅▆▇▇▇█▇█▇▆▇▆▆▆▆▅▅ █
  1.9 μs       Histogram: log(frequency) by time      4.42 μs <

 Memory estimate: 1.27 KiB, allocs estimate: 37.

Original code:

julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.844 μs … 195.860 μs  ┊ GC (min … max): 0.00% … 97.34%
 Time  (median):     1.993 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.118 μs ±   1.952 μs  ┊ GC (mean ± σ):  0.90% ±  0.97%

       ▅█▃                                                     
  ▁▁▂▃▇███▇▅▃▃▃▃▂▂▂▁▁▂▂▃▃▂▂▁▂▂▂▂▂▂▁▁▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.84 μs         Histogram: frequency by time        2.98 μs <

 Memory estimate: 1.27 KiB, allocs estimate: 37.

Also, what I meant was overspecialization in terms of Matrix. For example, I would rather do something like:


struct ThinPlateSpline{S,P,M}
    λ::S  # Stiffness.
    x1::P # control points
    Y::M  # Homogeneous control point coordinates
    Φ::M  # TPS kernel
    d::M  # Affine component
    c::M  # Non-affine component
end

As this is more general. If you can change to the above struct definition I would be happy to merge the PR.

mkitti commented 11 months ago

It's not clear to me that you actually were able benchmark the code in this branch. While performance can vary from computer to computer, the allocations should not vary.

Both of your benchmarks show the following, which matches the allocations I see on the master branch.

Memory estimate: 1.27 KiB, allocs estimate: 37

On this branch, you should see the following. Please make sure to restart Julia to fully reload the package.

Memory estimate: 480 bytes, allocs estimate: 5.

I changed the parameter to M as you requested.

mkitti commented 11 months ago

Are there any other changes you would like?

mkitti commented 11 months ago

Thank you

anj1 / ThinPlateSplines.jl

Improve type stability #9

Master branch

This pull request

master branch

This pull request