apache / arrow-julia

Official Julia implementation of Apache Arrow
https://arrow.apache.org/julia/
Other
284 stars 59 forks source link

Precompilation / latency? #189

Open msavael opened 3 years ago

msavael commented 3 years ago

With the release 1.6, I was reading a bit about precompilation. I noticed that Arrow package has quite high latency for the first time a file / message is read. I wonder if it would be practical to generate precompile statements for the most common column types - e.g. Int/Float primitives, Strings, Dates/DateTimes + batched/dictionary encoded versions of these. Perhaps the number of methods is quite large, but it could be interesting.

quinnj commented 3 years ago

I think it's worth looking into; I've had success w/ other packages. The main concern is when things are changing really fast which can "invalidate" existing precompilation statements if method signatures change. But I think the automated tooling around generating precompile statements is also much better these days. It'd probably be worth looking into how Arrow.jl fares with regards to general method invalidations as well to make sure it's not making things harder.

msavael commented 3 years ago

I had a quick look, with the example from here: https://timholy.github.io/SnoopCompile.jl/stable/snoopr/.

julia> using SnoopCompileCore

julia> invalidations = @snoopr begin
           import Arrow
       end

julia> using SnoopCompile

julia> trees = invalidation_trees(invalidations)
10-element Vector{SnoopCompile.MethodInvalidations}:
 inserting UInt32(x::Union{BitIntegers.AbstractBitSigned, BitIntegers.AbstractBitUnsigned}) in BitIntegers at /home/sava/.julia/packages/BitIntegers/fcpdN/src/BitIntegers.jl:152 invalidated:
   mt_backedges: 1: signature Tuple{Type{UInt32}, Integer} triggered MethodInstance for VersionNumber(::Integer, ::Int64, ::Int64, ::Tuple{}, ::Tuple{}) (1 children)
   13 mt_cache
 inserting UInt8(x::Union{BitIntegers.AbstractBitSigned, BitIntegers.AbstractBitUnsigned}) in BitIntegers at /home/sava/.julia/packages/BitIntegers/fcpdN/src/BitIntegers.jl:152 invalidated:
   mt_backedges: 1: signature Tuple{Type{UInt8}, Integer} triggered MethodInstance for convert(::Type{UInt8}, ::Integer) (1 children)
 inserting Int32(x::Union{BitIntegers.AbstractBitSigned, BitIntegers.AbstractBitUnsigned}) in BitIntegers at /home/sava/.julia/packages/BitIntegers/fcpdN/src/BitIntegers.jl:152 invalidated:
   mt_backedges: 1: signature Tuple{Type{Int32}, Integer} triggered MethodInstance for Int32(::Enum{T2} where T2<:Integer) (1 children)
 inserting Int8(x::Union{BitIntegers.AbstractBitSigned, BitIntegers.AbstractBitUnsigned}) in BitIntegers at /home/sava/.julia/packages/BitIntegers/fcpdN/src/BitIntegers.jl:152 invalidated:
   mt_backedges: 1: signature Tuple{Type{Int8}, Integer} triggered MethodInstance for convert(::Type{Int8}, ::Integer) (1 children)
 inserting Base.IteratorSize(::Type{R}) where R<:Union{Tables.AbstractColumns, Tables.AbstractRow} in Tables at /home/sava/.julia/packages/Tables/uYJXY/src/Tables.jl:169 invalidated:
   backedges: 1: superseding Base.IteratorSize(::Type) in Base at generator.jl:91 with MethodInstance for Base.IteratorSize(::Type{var"#s446"} where {T, var"#s446"<:LinearAlgebra.Factorization{T}}) (3 children)
   104 mt_cache
 inserting UInt64(x::Union{BitIntegers.AbstractBitSigned, BitIntegers.AbstractBitUnsigned}) in BitIntegers at /home/sava/.julia/packages/BitIntegers/fcpdN/src/BitIntegers.jl:152 invalidated:
   mt_backedges: 1: signature Tuple{Type{UInt64}, Integer} triggered MethodInstance for convert(::Type{UInt64}, ::Integer) (5 children)
 inserting isassigned(pa::Union{SubArray{T, N, var"#s5", I, L} where {var"#s5"<:(PooledArrays.PooledArray{T, R, N, RA} where {N, RA}), I, L}, PooledArrays.PooledArray{T, R, N, RA} where RA} where {T, N, R}, I::Int64...) in PooledArrays at /home/sava/.julia/packages/PooledArrays/CV8kA/src/PooledArrays.jl:468 invalidated:
   backedges: 1: superseding isassigned(a::AbstractArray, i::Integer...) in Base at abstractarray.jl:511 with MethodInstance for isassigned(::AbstractVecOrMat{T} where T, ::Int64, ::Int64) (8 children)
   24 mt_cache
 inserting Int64(x::Union{BitIntegers.AbstractBitSigned, BitIntegers.AbstractBitUnsigned}) in BitIntegers at /home/sava/.julia/packages/BitIntegers/fcpdN/src/BitIntegers.jl:152 invalidated:
   mt_backedges:  1: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _array_for(::Type{Any}, ::Any, ::Base.HasLength) (0 children)
                  2: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for to_shape(::Integer) (0 children)
                  3: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Test.var"#3#5", ::AbstractString, ::Integer) (0 children)
                  4: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Test.var"#4#6", ::AbstractString, ::Integer) (0 children)
                  5: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _array_for(::Type{T}, ::Any, ::Base.HasLength) where T (0 children)
                  6: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _array_for(::Type{String}, ::Any, ::Base.HasLength) (0 children)
                  7: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _array_for(::Type{Expr}, ::Any, ::Base.HasLength) (0 children)
                  8: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _similar_for(::Vector{T} where T, ::Type, ::Base.Generator{_A, Test.var"#24#25"{Int64}} where _A, ::Base.HasLength) (0 children)
                  9: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _similar_for(::Vector{T} where T, ::DataType, ::Base.Generator{_A, Test.var"#24#25"{Int64}} where _A, ::Base.HasLength) (0 children)
                 10: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Pkg.Types.var"#21#22"{String}, ::AbstractString, ::Integer) (0 children)
                 11: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for _array_for(::DataType, ::Any, ::Base.HasLength) (0 children)
                 12: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for pointer(::String, ::Integer) (3 children)
                 13: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for rpad(::String, ::Integer, ::String) (5 children)
                 14: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for convert(::Type{Int64}, ::Integer) (5 children)
 inserting axes(A::SentinelArrays.SentinelArray, i) in SentinelArrays at /home/sava/.julia/packages/SentinelArrays/Ubf17/src/SentinelArrays.jl:181 invalidated:
   backedges: 1: superseding axes(A::AbstractArray{T, N}, d) where {T, N} in Base at abstractarray.jl:68 with MethodInstance for axes(::AbstractVecOrMat{T} where T, ::Int64) (14 children)
   1 mt_cache
 inserting getproperty(::Type{Arrow.Flatbuf.MetadataVersionModule.MetadataVersion}, sym::Symbol) in Arrow.Flatbuf.MetadataVersionModule at /home/sava/.julia/packages/Arrow/ggVa0/src/FlatBuffers/FlatBuffers.jl:136 invalidated:
   backedges: 1: superseding getproperty(x::Type, f::Symbol) in Base at Base.jl:28 with MethodInstance for getproperty(::Type, ::Symbol) (4 children)
              2: superseding getproperty(x::Type, f::Symbol) in Base at Base.jl:28 with MethodInstance for getproperty(::DataType, ::Symbol) (1509 children)
   1 mt_cache

It looks like BitIntegers generates a few invalidations, but they don't have many children. The big one appears to be in Arrow/FlatBuffers, adding getproperty to a type? Note the 1509 children. I'm not an expert on this so may be interpret wrong.

ericphanson commented 3 years ago

Could be something to work on at the JuliaCon workshop Package development: improving engineering quality & latency !