apache / arrow-julia

Official Julia implementation of Apache Arrow
https://arrow.apache.org/julia/
Other
283 stars 59 forks source link

Test failure when running with > 1 thread #420

Open JoaoAparicio opened 1 year ago

JoaoAparicio commented 1 year ago

Found a flaky test, this one https://github.com/apache/arrow-julia/blob/c469151d4ff261b50c59bf98101f068fa577fca4/test/runtests.jl#L297-L305

If you run it without threads, always passes. If you run it with -t 10 it has a significant chance of erroring.

A) Start Julia with -t 10

B) Create the environment with:

]activate --temp
add Arrow
add Tables
add PooledArrays

C) Run the test many times:

using Tables
using Arrow
using PooledArrays

for _ in 1:10000
    t = Tables.partitioner(
        (
            (a=Arrow.toarrowvector(PooledArray([1,2,3  ])),),
            (a=Arrow.toarrowvector(PooledArray([1,2,3,4])),),
            (a=Arrow.toarrowvector(PooledArray([1,2,3,4,5])),),
        )
    )
    tt = Arrow.Table(Arrow.tobuffer(t))
end

Highly likely that you'll hit this:

(jl_k3aixG) pkg> ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:349 [inlined]
 [2] Arrow.Table(blobs::Vector{Arrow.ArrowBlob}; convert::Bool)
   @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:386
 [3] Table
   @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:295 [inlined]
 [4] Arrow.Table(input::IOBuffer, pos::Int64, len::Nothing; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:290
 [5] Table
   @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:290 [inlined]
 [6] Arrow.Table(input::IOBuffer)
   @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:290
 [7] top-level scope
   @ ./Untitled-2:13

    nested task error: MethodError: no method matching resize!(::Arrow.DictEncoded{Int64, Int8, Arrow.Primitive{Int64, Vector{Int64}}}, ::Int64)

    Closest candidates are:
      resize!(::Vector, ::Integer)
       @ Base array.jl:1246
      resize!(::BitVector, ::Integer)
       @ Base bitarray.jl:814
      resize!(::SparseArrays.ReadOnly, ::Any)
       @ SparseArrays ~/julia-1.9.0-rc2/share/julia/stdlib/v1.9/SparseArrays/src/readonly.jl:33
      ...

    Stacktrace:
     [1] _append!(a::Arrow.DictEncoded{Int64, Int8, Arrow.Primitive{Int64, Vector{Int64}}}, #unused#::Base.HasLength, iter::Tuple{Int64})
       @ Base ./array.jl:1134
     [2] append!(a::Arrow.DictEncoded{Int64, Int8, Arrow.Primitive{Int64, Vector{Int64}}}, iter::Tuple{Int64})
       @ Base ./array.jl:1126
     [3] push!(a::Arrow.DictEncoded{Int64, Int8, Arrow.Primitive{Int64, Vector{Int64}}}, iter::Int64)
       @ Base ./array.jl:1127
     [4] push!(A::SentinelArrays.ChainedVector{Int64, Arrow.DictEncoded{Int64, Int8, Arrow.Primitive{Int64, Vector{Int64}}}}, val::Int64)
       @ SentinelArrays ~/.julia/packages/SentinelArrays/BcfVF/src/chainedvector.jl:506
     [5] append!(A::SentinelArrays.ChainedVector{Int64, Arrow.DictEncoded{Int64, Int8, Arrow.Primitive{Int64, Vector{Int64}}}}, B::Arrow.DictEncoded{Int64, Int8, SentinelArrays.ChainedVector{Int64, Arrow.Primitive{Int64, Vector{Int64}}}})
       @ SentinelArrays ~/.julia/packages/SentinelArrays/BcfVF/src/chainedvector.jl:664
     [6] (::Arrow.var"#107#113"{Vector{Any}, Arrow.Table})(i::Int64)
       @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:313
     [7] foreach(f::Arrow.var"#107#113"{Vector{Any}, Arrow.Table}, itr::UnitRange{Int64})
       @ Base ./abstractarray.jl:3073
     [8] macro expansion
       @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:312 [inlined]
     [9] (::Arrow.var"#104#110"{Channel{Any}, Arrow.Table})()
       @ Arrow ./threadingconstructs.jl:341

I've reproduced this in both 1.8.5 and 1.9.0-rc2, in two machines. Machine 1:

julia> versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 10 on 8 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 

Machine 2:


julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
  Threads: 20 on 48 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8
ericphanson commented 1 year ago

That sounds like a bug that the test is catching rather than just a flaky test!