Closed mkitti closed 11 months ago
I'm not sure what these extra parameters are doing? What kind of usage are you envisioning where these parameters would help avoid bugs? A concrete example might be good.
Currently the field types of ThinPlateSplines
are Any
:
julia> fieldtypes(ThinPlateSpline)
(Any, Any, Any, Any, Any, Any)
This leads to type instability harming performance: https://m3g.github.io/JuliaNotes.jl/stable/instability/
Type instability makes it very difficult for Julia to compile this code efficiently. Looking at tps_deform
we see these Any
fields propagate throughout the function lowering.
julia> @code_warntype tps_deform(x1, tps)
MethodInstance for ThinPlateSplines.tps_deform(::Matrix{Float64}, ::ThinPlateSpline)
from tps_deform(x2::AbstractArray{T, D}, tps::ThinPlateSpline) where {T, D} @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:107
Static Parameters
T = Float64
D = 2
Arguments
#self#::Core.Const(ThinPlateSplines.tps_deform)
x2::Matrix{Float64}
tps::ThinPlateSpline
Locals
yt::Any
sumsqr::Any
all_homo_z::Any
c::Any
d::Any
x1::Any
ππΈπ!@_10 ::ThinPlateSplines.var"#ππΈπ!#4"
β³πΆπβ―@_11 ::ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"}
ππΈπ!@_12 ::ThinPlateSplines.var"#ππΈπ!#4"
βͺ1:Dβ«::UnitRange{Int64}
ππΈπ!@_14 ::ThinPlateSplines.var"#19#ππΈπ!#7"
β³πΆπβ―@_15 ::ThinPlateSplines.var"#20#β³πΆπβ―#8"{ThinPlateSplines.var"#19#ππΈπ!#7"}
ππΈπ!@_16 ::ThinPlateSplines.var"#19#ππΈπ!#7"
πΆπi ::Any
πΆπj ::Any
πΆπl ::Any
ππΈπ!@_20 ::ThinPlateSplines.var"#23#ππΈπ!#10"
Body::Any
1 ββ Core.NewvarNode(:(yt))
β Core.NewvarNode(:(sumsqr))
β Core.NewvarNode(:(all_homo_z))
β %4 = Base.getproperty(tps, :x1)::Any
β %5 = Base.getproperty(tps, :d)::Any
β %6 = Base.getproperty(tps, :c)::Any
β (x1 = %4)
β (d = %5)
β (c = %6)
β %10 = d::Any
β %11 = Base.vect()::Vector{Any}
β %12 = (%10 == %11)::Any
ββββ goto #3 if not %12
2 ββ %14 = ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.")::Any
β ThinPlateSplines.throw(%14)
ββββ Core.Const(:(goto %17))
3 ββ %17 = $(Expr(:static_parameter, 1))::Core.Const(Float64)
β %18 = ThinPlateSplines.size(x2, 1)::Int64
β %19 = ThinPlateSplines.ones(%17, %18)::Vector{Float64}
β %20 = (:dims,)::Core.Const((:dims,))
β %21 = Core.apply_type(Core.NamedTuple, %20)::Core.Const(NamedTuple{(:dims,)})
β %22 = Core.tuple(2)::Core.Const((2,))
β %23 = (%21)(%22)::Core.Const((dims = 2,))
β (all_homo_z = Core.kwcall(%23, ThinPlateSplines.cat, %19, x2))
β Core.NewvarNode(:(ππΈπ!@_10 ))
β %26 = (ndims)(all_homo_z)::Any
β %27 = (%26 == 2)::Any
ββββ goto #5 if not %27
4 ββ goto #6
5 ββ (throw)("expected a 2-array all_homo_z")
6 ββ %31 = (ndims)(x1)::Any
β %32 = (%31 == 2)::Any
ββββ goto #8 if not %32
7 ββ goto #9
8 ββ (throw)("expected a 2-array x1")
9 ββ (ππΈπ!@_10 = %new(ThinPlateSplines.:(var"#ππΈπ!#4" )))
β %37 = ππΈπ!@_10 ::Core.Const(ThinPlateSplines.var"#ππΈπ!#4"())
β (ππΈπ!@_12 = %37)
β %39 = ThinPlateSplines.:(var"#β³πΆπβ―#5" )::Core.Const(ThinPlateSplines.var"#β³πΆπβ―#5")
β %40 = Core.typeof(ππΈπ!@_12 )::Core.Const(ThinPlateSplines.var"#ππΈπ!#4")
β %41 = Core.apply_type(%39, %40)::Core.Const(ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"})
β (β³πΆπβ―@_11 = %new(%41, ππΈπ!@_12 ))
β %43 = (Tullio.Eval)(β³πΆπβ―@_11 , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"}, Nothing} (ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"}(ThinPlateSplines.var"#ππΈπ!#4"()), nothing ))
β %44 = all_homo_z::Any
β %45 = (%43)(%44, x1)::Any
β (sumsqr = %45)
β Core.NewvarNode(:(βͺ1:Dβ«))
β Core.NewvarNode(:(ππΈπ!@_14 ))
β %49 = (ndims)(sumsqr)::Any
β %50 = (%49 == 2)::Any
ββββ goto #11 if not %50
10 β goto #12
11 β (throw)("expected a 2-array sumsqr")
12 β %54 = (ndims)(c)::Any
β %55 = (%54 == 2)::Any
ββββ goto #14 if not %55
13 β goto #15
14 β (throw)("expected a 2-array c")
15 β (βͺ1:Dβ« = 1:$(Expr(:static_parameter, 2)))
β %60 = (βͺ1:Dβ«::Core.Const(1:2) isa ThinPlateSplines.AbstractRange)::Core.Const(true)
ββββ goto #17 if not %60
16 β goto #18
17 β Core.Const(:(1:$(Expr(:static_parameter, 2))))
β Core.Const(:(ThinPlateSplines.string(%63)))
β Core.Const(:("expected a range for (j in 1:D), got " * %64))
ββββ Core.Const(:(ThinPlateSplines.throw(%65)))
18 β (ππΈπ!@_14 = %new(ThinPlateSplines.:(var"#19#ππΈπ!#7" )))
β %68 = ππΈπ!@_14 ::Core.Const(ThinPlateSplines.var"#19#ππΈπ!#7"())
β (ππΈπ!@_16 = %68)
β %70 = ThinPlateSplines.:(var"#20#β³πΆπβ―#8" )::Core.Const(ThinPlateSplines.var"#20#β³πΆπβ―#8")
β %71 = Core.typeof(ππΈπ!@_16 )::Core.Const(ThinPlateSplines.var"#19#ππΈπ!#7")
β %72 = Core.apply_type(%70, %71)::Core.Const(ThinPlateSplines.var"#20#β³πΆπβ―#8"{ThinPlateSplines.var"#19#ππΈπ!#7"})
β (β³πΆπβ―@_15 = %new(%72, ππΈπ!@_16 ))
β %74 = (Tullio.Eval)(β³πΆπβ―@_15 , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#20#β³πΆπβ―#8"{ThinPlateSplines.var"#19#ππΈπ!#7"}, No thing}(ThinPlateSplines.var"#20#β³πΆπβ―#8"{ThinPlateSplines.var"#19#ππΈπ!#7"}(ThinPlateSplines.var"#19#ππΈπ! #7"()), nothing))
β %75 = sumsqr::Any
β %76 = c::Any
β %77 = (%74)(%75, %76, βͺ1:Dβ«::Core.Const(1:2))::Any
β (yt = %77)
β Core.NewvarNode(:(πΆπi ))
β Core.NewvarNode(:(πΆπj ))
β Core.NewvarNode(:(πΆπl ))
β Core.NewvarNode(:(ππΈπ!@_20 ))
β %83 = (ndims)(yt)::Any
β %84 = (%83 == 2)::Any
ββββ goto #20 if not %84
19 β goto #21
20 β (throw)("expected a 2-array yt")
21 β %88 = (ndims)(d)::Any
β %89 = (%88 == 2)::Any
ββββ goto #23 if not %89
22 β goto #24
23 β (throw)("expected a 2-array d")
24 β %93 = (ndims)(all_homo_z)::Any
β %94 = (%93 == 2)::Any
ββββ goto #26 if not %94
25 β goto #27
26 β (throw)("expected a 2-array all_homo_z")
27 β (ππΈπ!@_20 = %new(ThinPlateSplines.:(var"#23#ππΈπ!#10" )))
β (πΆπl = (axes)(d, 1))
β %100 = (axes)(all_homo_z, 2)::Any
β %101 = (axes)(d, 1)::Any
β %102 = (%100 == %101)::Any
ββββ goto #29 if not %102
28 β goto #30
29 β (throw)("range of index l must agree")
30 β %106 = (axes)(yt, 2)::Any
β %107 = ThinPlateSplines.:-::Core.Const(-)
β %108 = (axes)(d, 2)::Any
β %109 = Base.broadcasted(%107, %108, 1)::Any
β %110 = Base.materialize(%109)::Any
β (πΆπj = ThinPlateSplines.intersect(%106, %110))
β (πΆπi = (axes)(yt, 1))
β %113 = (axes)(all_homo_z, 1)::Any
β %114 = (axes)(yt, 1)::Any
β %115 = (%113 == %114)::Any
ββββ goto #32 if not %115
31 β goto #33
32 β (throw)("range of index i must agree")
33 β %119 = ππΈπ!@_20 ::Core.Const(ThinPlateSplines.var"#23#ππΈπ!#10"())
β %120 = (Tullio.storage_type)(yt, d, all_homo_z)::Any
β %121 = yt::Any
β %122 = ThinPlateSplines.tuple(d, all_homo_z)::Tuple{Any, Any}
β %123 = ThinPlateSplines.tuple(πΆπi , πΆπj )::Tuple{Any, Any}
β %124 = ThinPlateSplines.tuple(πΆπl )::Tuple{Any}
β %125 = ThinPlateSplines.:+::Core.Const(+)
β (Tullio.threader)(%119, %120, %121, %122, %123, %124, %125, 262144, true)
β yt
ββββ return yt
After this pull request, all the types are now known to be Float64
. Julia can now produce efficient machine code now that it knows the machine types.
julia> @code_warntype tps_deform(x1, tps)
MethodInstance for ThinPlateSplines.tps_deform(::Matrix{Float64}, ::ThinPlateSpline{Float64, Matrix{Float64}, Float64})
from tps_deform(x2::AbstractMatrix{T}, tps::ThinPlateSpline) where T @ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:107
Static Parameters
T = Float64
Arguments
#self#::Core.Const(ThinPlateSplines.tps_deform)
x2::Matrix{Float64}
tps::ThinPlateSpline{Float64, Matrix{Float64}, Float64}
Locals
yt::Matrix{Float64}
sumsqr::Matrix{Float64}
all_homo_z::Matrix{Float64}
D::Int64
c::Matrix{Float64}
d::Matrix{Float64}
x1::Matrix{Float64}
ππΈπ!@_11 ::ThinPlateSplines.var"#ππΈπ!#4"
β³πΆπβ―@_12 ::ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"}
ππΈπ!@_13 ::ThinPlateSplines.var"#ππΈπ!#4"
βͺ1:Dβ«::UnitRange{Int64}
ππΈπ!@_15 ::ThinPlateSplines.var"#22#ππΈπ!#7"
β³πΆπβ―@_16 ::ThinPlateSplines.var"#23#β³πΆπβ―#8"{ThinPlateSplines.var"#22#ππΈπ!#7"}
ππΈπ!@_17 ::ThinPlateSplines.var"#22#ππΈπ!#7"
πΆπi ::Base.OneTo{Int64}
πΆπj ::UnitRange{Int64}
πΆπl ::Base.OneTo{Int64}
ππΈπ!@_21 ::ThinPlateSplines.var"#26#ππΈπ!#10"
Body::Matrix{Float64}
1 ββ Core.NewvarNode(:(yt))
β Core.NewvarNode(:(sumsqr))
β Core.NewvarNode(:(all_homo_z))
β Core.NewvarNode(:(D))
β %5 = Base.getproperty(tps, :x1)::Matrix{Float64}
β %6 = Base.getproperty(tps, :d)::Matrix{Float64}
β %7 = Base.getproperty(tps, :c)::Matrix{Float64}
β (x1 = %5)
β (d = %6)
β (c = %7)
β %11 = d::Matrix{Float64}
β %12 = Base.vect()::Vector{Any}
β %13 = (%11 == %12)::Core.Const(false)
ββββ goto #3 if not %13
2 ββ Core.Const(:(ThinPlateSplines.ArgumentError("Affine component not available; run tps_solve with compute_affine=true.")))
β Core.Const(:(ThinPlateSplines.throw(%15)))
ββββ Core.Const(:(goto %18))
3 ββ (D = ThinPlateSplines.size(x2, 2))
β %19 = $(Expr(:static_parameter, 1))::Core.Const(Float64)
β %20 = ThinPlateSplines.size(x2, 1)::Int64
β %21 = ThinPlateSplines.ones(%19, %20)::Vector{Float64}
β (all_homo_z = ThinPlateSplines.hcat(%21, x2))
β Core.NewvarNode(:(ππΈπ!@_11 ))
β %24 = (ndims)(all_homo_z)::Core.Const(2)
β %25 = (%24 == 2)::Core.Const(true)
ββββ goto #5 if not %25
4 ββ goto #6
5 ββ Core.Const(:((throw)("expected a 2-array all_homo_z")))
6 ββ %29 = (ndims)(x1)::Core.Const(2)
β %30 = (%29 == 2)::Core.Const(true)
ββββ goto #8 if not %30
7 ββ goto #9
8 ββ Core.Const(:((throw)("expected a 2-array x1")))
9 ββ (ππΈπ!@_11 = %new(ThinPlateSplines.:(var"#ππΈπ!#4" )))
β %35 = ππΈπ!@_11 ::Core.Const(ThinPlateSplines.var"#ππΈπ!#4"())
β (ππΈπ!@_13 = %35)
β %37 = ThinPlateSplines.:(var"#β³πΆπβ―#5" )::Core.Const(ThinPlateSplines.var"#β³πΆπβ―#5")
β %38 = Core.typeof(ππΈπ!@_13 )::Core.Const(ThinPlateSplines.var"#ππΈπ!#4")
β %39 = Core.apply_type(%37, %38)::Core.Const(ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"})
β (β³πΆπβ―@_12 = %new(%39, ππΈπ!@_13 ))
β %41 = (Tullio.Eval)(β³πΆπβ―@_12 , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"}, Nothing} (ThinPlateSplines.var"#β³πΆπβ―#5"{ThinPlateSplines.var"#ππΈπ!#4"}(ThinPlateSplines.var"#ππΈπ!#4"()), nothing ))
β %42 = all_homo_z::Matrix{Float64}
β %43 = (%41)(%42, x1)::Matrix{Float64}
β (sumsqr = %43)
β Core.NewvarNode(:(βͺ1:Dβ«))
β Core.NewvarNode(:(ππΈπ!@_15 ))
β %47 = (ndims)(sumsqr)::Core.Const(2)
β %48 = (%47 == 2)::Core.Const(true)
ββββ goto #11 if not %48
10 β goto #12
11 β Core.Const(:((throw)("expected a 2-array sumsqr")))
12 β %52 = (ndims)(c)::Core.Const(2)
β %53 = (%52 == 2)::Core.Const(true)
ββββ goto #14 if not %53
13 β goto #15
14 β Core.Const(:((throw)("expected a 2-array c")))
15 β (βͺ1:Dβ« = 1:D)
β %58 = (βͺ1:Dβ«::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]) isa ThinPlateSplines.AbstractRange)::Core.Const(true)
ββββ goto #17 if not %58
16 β goto #18
17 β Core.Const(:(1:D))
β Core.Const(:(ThinPlateSplines.string(%61)))
β Core.Const(:("expected a range for (j in 1:D), got " * %62))
ββββ Core.Const(:(ThinPlateSplines.throw(%63)))
18 β (ππΈπ!@_15 = %new(ThinPlateSplines.:(var"#22#ππΈπ!#7" )))
β %66 = ππΈπ!@_15 ::Core.Const(ThinPlateSplines.var"#22#ππΈπ!#7"())
β (ππΈπ!@_17 = %66)
β %68 = ThinPlateSplines.:(var"#23#β³πΆπβ―#8" )::Core.Const(ThinPlateSplines.var"#23#β³πΆπβ―#8")
β %69 = Core.typeof(ππΈπ!@_17 )::Core.Const(ThinPlateSplines.var"#22#ππΈπ!#7")
β %70 = Core.apply_type(%68, %69)::Core.Const(ThinPlateSplines.var"#23#β³πΆπβ―#8"{ThinPlateSplines.var"#22#ππΈπ!#7"})
β (β³πΆπβ―@_16 = %new(%70, ππΈπ!@_17 ))
β %72 = (Tullio.Eval)(β³πΆπβ―@_16 , nothing)::Core.Const(Tullio.Eval{ThinPlateSplines.var"#23#β³πΆπβ―#8"{ThinPlateSplines.var"#22#ππΈπ!#7"}, No thing}(ThinPlateSplines.var"#23#β³πΆπβ―#8"{ThinPlateSplines.var"#22#ππΈπ!#7"}(ThinPlateSplines.var"#22#ππΈπ! #7"()), nothing))
β %73 = sumsqr::Matrix{Float64}
β %74 = c::Matrix{Float64}
β %75 = (%72)(%73, %74, βͺ1:Dβ«::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]))::Matrix{Float64}
β (yt = %75)
β Core.NewvarNode(:(πΆπi ))
β Core.NewvarNode(:(πΆπj ))
β Core.NewvarNode(:(πΆπl ))
β Core.NewvarNode(:(ππΈπ!@_21 ))
β %81 = (ndims)(yt)::Core.Const(2)
β %82 = (%81 == 2)::Core.Const(true)
ββββ goto #20 if not %82
19 β goto #21
20 β Core.Const(:((throw)("expected a 2-array yt")))
21 β %86 = (ndims)(d)::Core.Const(2)
β %87 = (%86 == 2)::Core.Const(true)
ββββ goto #23 if not %87
22 β goto #24
23 β Core.Const(:((throw)("expected a 2-array d")))
24 β %91 = (ndims)(all_homo_z)::Core.Const(2)
β %92 = (%91 == 2)::Core.Const(true)
ββββ goto #26 if not %92
25 β goto #27
26 β Core.Const(:((throw)("expected a 2-array all_homo_z")))
27 β (ππΈπ!@_21 = %new(ThinPlateSplines.:(var"#26#ππΈπ!#10" )))
β (πΆπl = (axes)(d, 1))
β %98 = (axes)(all_homo_z, 2)::Base.OneTo{Int64}
β %99 = (axes)(d, 1)::Base.OneTo{Int64}
β %100 = (%98 == %99)::Bool
ββββ goto #29 if not %100
28 β goto #30
29 β (throw)("range of index l must agree")
30 β %104 = (axes)(yt, 2)::Base.OneTo{Int64}
β %105 = ThinPlateSplines.:-::Core.Const(-)
β %106 = (axes)(d, 2)::Base.OneTo{Int64}
β %107 = Base.broadcasted(%105, %106, 1)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(0), Int64])
β %108 = Base.materialize(%107)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(0), Int64])
β (πΆπj = ThinPlateSplines.intersect(%104, %108))
β (πΆπi = (axes)(yt, 1))
β %111 = (axes)(all_homo_z, 1)::Base.OneTo{Int64}
β %112 = (axes)(yt, 1)::Base.OneTo{Int64}
β %113 = (%111 == %112)::Bool
ββββ goto #32 if not %113
31 β goto #33
32 β (throw)("range of index i must agree")
33 β %117 = ππΈπ!@_21 ::Core.Const(ThinPlateSplines.var"#26#ππΈπ!#10"())
β %118 = (Tullio.storage_type)(yt, d, all_homo_z)::Core.Const(Matrix{Float64})
β %119 = yt::Matrix{Float64}
β %120 = ThinPlateSplines.tuple(d, all_homo_z)::Tuple{Matrix{Float64}, Matrix{Float64}}
β %121 = ThinPlateSplines.tuple(πΆπi , πΆπj ::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]))::Core.PartialStruct(Tuple{Base.OneTo{Int64}, UnitRange{Int64}}, Any[Base.OneTo{Int64}, Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])])
β %122 = ThinPlateSplines.tuple(πΆπl )::Tuple{Base.OneTo{Int64}}
β %123 = ThinPlateSplines.:+::Core.Const(+)
β (Tullio.threader)(%117, %118, %119, %120, %121, %122, %123, 262144, true)
β yt
ββββ return yt
You're mentioning "efficient machine code" and also "lowered code" (which are different concepts) but that is just the typed code, not the machine code or the lowered code. Can you show an example benchmark showing this changing the performance of the code?
Also please read https://www.oxinabox.net/2020/04/19/Julia-Antipatterns.html#over-constraining-argument-types
I think some type constraints could be beneficial here, but I'm not fully convinced yet that having three type parameters is necessarily the right way to do it. I'm open to suggestions however.
Type stability in Julia has enormous performance benefits.
julia> x1 = [0.0 1.0
1.0 0.0
1.0 1.0]
3Γ2 Matrix{Float64}:
0.0 1.0
1.0 0.0
1.0 1.0
julia> x2 = [0.0 1.0
1.1 0.0
1.2 1.5]
3Γ2 Matrix{Float64}:
0.0 1.0
1.1 0.0
1.2 1.5
julia> tps = tps_solve(x1, x2, 1.0)
ThinPlateSpline(1.0, [0.0 1.0; 1.0 0.0; 1.0 1.0], [1.0 0.0 1.0; 1.0 1.1 0.0; 1.0 1.2 1.5], [0.0 0.6931471805599455 0.0; 0.6931471805599455 0.0 0.0; 0.0 0.0 0.0], [1.0 -0.09999999999999976 -0.5; 3.294173637189667e-16 1.1999999999999997 0.49999999999999994; -1.5700924586837752e-16 0.09999999999999981 1.5], [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0])
julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 2.944 ΞΌs β¦ 472.878 ΞΌs β GC (min β¦ max): 0.00% β¦ 98.75%
Time (median): 3.167 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 3.506 ΞΌs Β± 6.657 ΞΌs β GC (mean Β± Ο): 2.65% Β± 1.39%
βββ
ββββ
βββββββββββββββββ ββββββββββ ββ β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ β
2.94 ΞΌs Histogram: log(frequency) by time 5.38 ΞΌs <
Memory estimate: 1.27 KiB, allocs estimate: 37.
julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
Range (min β¦ max): 439.394 ns β¦ 35.221 ΞΌs β GC (min β¦ max): 0.00% β¦ 94.01%
Time (median): 454.040 ns β GC (median): 0.00%
Time (mean Β± Ο): 525.449 ns Β± 678.367 ns β GC (mean Β± Ο): 5.28% Β± 4.25%
βββββββββββ βββ β β
ββββββββββββββββββββββββββ
β
βββ
ββββββββββββ
ββββββ
βββββββββββββ β
439 ns Histogram: log(frequency) by time 1.26 ΞΌs <
Memory estimate: 480 bytes, allocs estimate: 5.
You're mentioning "efficient machine code" and also "lowered code" (which are different concepts) but that is just the typed code, not the machine code or the lowered code. Can you show an example benchmark showing this changing the performance of the code?
See the prior post for benchmarks.
The type code actually has already been lowered. I will elaborate on this in the next comment.
Also please read https://www.oxinabox.net/2020/04/19/Julia-Antipatterns.html#over-constraining-argument-types
Frames, @oxinabox, is talking about method arguments there. We are mostly talking about struct fields here. Without these type parameters Julia cannot know what the types of the fields are when it accesses them. For tps_deform
the consequence of this is that we cannot predict the return type. @code_warntype
indicates it is Body::Any
.
I think some type constraints could be beneficial here, but I'm not fully convinced yet that having three type parameters is necessarily the right way to do it. I'm open to suggestions however.
We could simplify the type parameter to a single one. However, this now requires that Ξ»
be of the same element type as x1
.
struct ThinPlateSpline{T}
Ξ»::T # Stiffness.
x1::Matrix{T} # control points
Y::Matrix{T} # Homogeneous control point coordinates
Ξ¦::Matrix{T} # TPS kernel
d::Matrix{T} # Affine component
c::Matrix{T} # Non-affine component
end
However, this will fail the tests as I have currently written them.
julia> x1 = [0 1
1 0
1 1]
3Γ2 Matrix{Int64}:
0 1
1 0
1 1
julia> x2 = [0 1
1 0
1.2 1.5]
3Γ2 Matrix{Float64}:
0.0 1.0
1.0 0.0
1.2 1.5
julia> using ThinPlateSplines
[ Info: Precompiling ThinPlateSplines [1d861738-f48e-4029-b1d3-81ce6bc7f5ab]
julia> tps = tps_solve(x1, x2, 1.0)
ERROR: MethodError: no method matching ThinPlateSpline(::Float64, ::Matrix{Int64}, ::Matrix{Float64}, ::Matrix{Float64}, ::Matrix{Float64}, ::Matrix{Float64})
Closest candidates are:
ThinPlateSpline(::T, ::Matrix{T}, ::Matrix{T}, ::Matrix{T}, ::Matrix{T}, ::Matrix{T}) where T
@ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:24
Stacktrace:
[1] tps_solve(x::Matrix{Int64}, y::Matrix{Float64}, Ξ»::Float64; compute_affine::Bool)
@ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:83
[2] tps_solve(x::Matrix{Int64}, y::Matrix{Float64}, Ξ»::Float64)
@ ThinPlateSplines c:\Users\kittisopikulm\.julia\dev\ThinPlateSplines\src\ThinPlateSplines.jl:61
[3] top-level scope
@ REPL[5]:1
julia> typeof(x1)
Matrix{Int64} (alias for Array{Int64, 2})
The reason for this is that x1
is a Matrix{Int64}
where as the other components use Float64
. To maintain flexibility and compatability, I let the input parameters freely vary independently from the calculated fields. x1
, Ξ»
, and Y
can be of different (element) types.
For compiled code, I think the LLVM IR is the most readable. Notice how with this pull request, we see more references to double
and i64
while the code is shorter. You do not see any refrences to double
(Float64
) in the current master code.
@mkitti is correct that that blog post I wrote was primarily talking about typed constraints on methods. Type parameters on struct fields are mostly good and are required to avoid field access being type unstable.
There is a corner case where you are trying to minimize compile time where you don't want this (see DataFrames.jl). But that doesn't apply to this case. And when it does apply care needs to be taken with how the code is written (eg function barriers) to make sure you hit the runtime vs compile-time performance trade off desired.
The simplest design for the ThinPlateSpline
struct would be make everything a Float64
. Then no parameters are needed. This might break some code that starts with integers though.
Thanks for the benchmarks. Your results are interesting as I'm not getting any discernible difference.
This PR:
julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min β¦ max): 1.895 ΞΌs β¦ 166.285 ΞΌs β GC (min β¦ max): 0.00% β¦ 97.21%
Time (median): 2.072 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 2.286 ΞΌs Β± 1.707 ΞΌs β GC (mean Β± Ο): 0.71% Β± 0.97%
ββββββββββββββββββ β
ββββββββββββββββββββββββββ
β
ββ
βββ
ββββ
β
ββ
β
βββ
βββββββββββββββ
β
β
1.9 ΞΌs Histogram: log(frequency) by time 4.42 ΞΌs <
Memory estimate: 1.27 KiB, allocs estimate: 37.
Original code:
julia> @benchmark tps_deform($x1, $tps)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min β¦ max): 1.844 ΞΌs β¦ 195.860 ΞΌs β GC (min β¦ max): 0.00% β¦ 97.34%
Time (median): 1.993 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 2.118 ΞΌs Β± 1.952 ΞΌs β GC (mean Β± Ο): 0.90% Β± 0.97%
β
ββ
ββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββ β
1.84 ΞΌs Histogram: frequency by time 2.98 ΞΌs <
Memory estimate: 1.27 KiB, allocs estimate: 37.
Also, what I meant was overspecialization in terms of Matrix
. For example, I would rather do something like:
struct ThinPlateSpline{S,P,M}
Ξ»::S # Stiffness.
x1::P # control points
Y::M # Homogeneous control point coordinates
Ξ¦::M # TPS kernel
d::M # Affine component
c::M # Non-affine component
end
As this is more general. If you can change to the above struct definition I would be happy to merge the PR.
It's not clear to me that you actually were able benchmark the code in this branch. While performance can vary from computer to computer, the allocations should not vary.
Both of your benchmarks show the following, which matches the allocations I see on the master branch.
Memory estimate: 1.27 KiB, allocs estimate: 37
On this branch, you should see the following. Please make sure to restart Julia to fully reload the package.
Memory estimate: 480 bytes, allocs estimate: 5.
I changed the parameter to M
as you requested.
Are there any other changes you would like?
Thank you
ThinPlateSpline
hcat
rather thancat(dims = 2, ...)