JuliaParallel / Dagger.jl

A framework for out-of-core and parallel execution
Other
635 stars 67 forks source link

Dagger with arrays containg missing values #96

Closed Alexander-Barth closed 5 years ago

Alexander-Barth commented 5 years ago

I try to use Dagger on an array of the type Array{Union{Missing, Int64},2} but I get an error (see below). I tried to adapt this example https://github.com/JuliaParallel/Dagger.jl/blob/master/test/array.jl#L155 to find out how Dagger is working. Is there a problem with my code or is this a bug? I use julia 1.0.1 and Dagger 0.7.1.

The code

using Dagger
using Missings
x = allowmissing(rand(1:10, 10, 5))
@show reduce(+, x, dims=1)  # works
X = Distribute(Blocks(3,3), x)
collect(reduce(+, X, dims=1)) # fails

The full error message is:

ERROR: ArgumentError: type does not have a definite number of fields
fieldcount(::Any) at ./reflection.jl:599
fixedlength(::Type, ::IdDict{Any,Any}) at /home/abarth/.julia/packages/MemPool/Z2LCh/src/io.jl:155
fixedlength(::Type) at /home/abarth/.julia/packages/MemPool/Z2LCh/src/io.jl:145
approx_size(::Type, ::Int64, ::Array{Union{Missing, Int64},2}) at /home/abarth/.julia/packages/MemPool/Z2LCh/src/MemPool.jl:78
approx_size at /home/abarth/.julia/packages/MemPool/Z2LCh/src/MemPool.jl:74 [inlined]
(::getfield(MemPool, Symbol("#kw##poolset")))(::NamedTuple{(:destroyonevict,),Tuple{Bool}}, ::typeof(MemPool.poolset), ::Array{Union{Missing, Int64},2}, ::Int64) at ./none:0 (repeats 2 times)
#tochunk#40(::Bool, ::Bool, ::Function, ::Array{Union{Missing, Int64},2}) at /home/abarth/.julia/packages/Dagger/yLPgg/src/chunks.jl:84
(::getfield(Dagger, Symbol("#kw##tochunk")))(::NamedTuple{(:persist, :cache),Tuple{Bool,Bool}}, ::typeof(Dagger.tochunk), ::Array{Union{Missing, Int64},2}) at ./none:0
do_task(::Context, ::OSProc, ::Int64, ::Function, ::Tuple{Array{Union{Missing, Int64},2}}, ::Bool, ::Bool, ::Bool) at /home/abarth/.julia/packages/Dagger/yLPgg/src/scheduler.jl:254
#143 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:339 [inlined]
run_work_thunk(::getfield(Distributed, Symbol("##143#144")){typeof(Dagger.Sch.do_task),Tuple{Context,OSProc,Int64,typeof(identity),Tuple{Array{Union{Missing, Int64},2}},Bool,Bool,Bool},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}}, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:56
#remotecall_fetch#148(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Distributed.LocalProcess, ::Context, ::Vararg{Any,N} where N) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:364
remotecall_fetch(::Function, ::Distributed.LocalProcess, ::Context, ::Vararg{Any,N} where N) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:364
#remotecall_fetch#152(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Int64, ::Context, ::Vararg{Any,N} where N) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:406
remotecall_fetch at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:406 [inlined]
macro expansion at /home/abarth/.julia/packages/Dagger/yLPgg/src/scheduler.jl:269 [inlined]
(::getfield(Dagger.Sch, Symbol("##13#14")){Context,OSProc,Int64,typeof(identity),Tuple{Array{Union{Missing, Int64},2}},Channel{Any},Bool,Bool,Bool})() at ./task.jl:259
pazzo83 commented 5 years ago

I've got this problem with columns that are Union{String, Missing}

blah = table(["hi", missing, "oh", "what"], rand(4), names=[:x, :y])
distribute(blah, 1)
ERROR: ArgumentError: type does not have a definite number of fields
jpsamaroo commented 5 years ago

From the stacktrace, this appears to be an issue in MemPool (a dependency of Dagger). Does anyone have an MWE which uses the built-in Julia missing type (i.e. without using Missings.jl)?

Alexander-Barth commented 5 years ago

Actually with Dagger v0.8.0, the issue if fixed. I can run this snipped without error and I get the correct value. Thanks for checking back!

julia> using Dagger

julia> using Missings

julia> x = allowmissing(rand(1:10, 10, 5))
10×5 Array{Union{Missing, Int64},2}:
  6   9  6   6   5
  5   7  9  10   5
  5   6  6   1   3
  7   9  3   3   9
  8   7  4   6  10
  9   5  5   2   5
  5  10  7   1   4
 10   6  8   6   5
  7   3  3   1   7
  7   4  5   7   4

julia> @show reduce(+, x, dims=1)  # works
reduce(+, x, dims=1) = Union{Missing, Int64}[69 66 56 43 57]
1×5 Array{Union{Missing, Int64},2}:
 69  66  56  43  57

julia> X = Distribute(Blocks(3,3), x)
Distribute{Union{Missing, Int64},2}(10, 5)

julia> collect(reduce(+, X, dims=1))
1×5 Array{Int64,2}:
 69  66  56  43  57