JuliaParallel / Dagger.jl

A framework for out-of-core and parallel execution
Other
638 stars 67 forks source link

`type Nothing has no field thunk_dict` with Distributed #431

Closed StevenWhitaker closed 1 year ago

StevenWhitaker commented 1 year ago

I get the error mentioned in the title with the following example.

Contents of mwe.jl:

using Distributed, DelimitedFiles
nworkers = 1
addprocs(nworkers - nprocs() + 1)

@everywhere using CSV, DTables, DataFrames

file = tempname() * ".csv"
writedlm(file, [1, 2])

# DTable(x -> CSV.File(x), [file]; tabletype = DataFrame)

d = remotecall_fetch(2, file) do f
    DTable(x -> CSV.File(x), [f]; tabletype = DataFrame)
end

rm(file)

Result:

julia> include("mwe.jl")
ERROR: LoadError: On worker 2:
type Nothing has no field thunk_dict
Stacktrace:
  [1] getproperty
    @ ./Base.jl:37
  [2] #eager_submit_internal!#94
    @ ~/.julia/packages/Dagger/xGAvM/src/submission.jl:88
  [3] eager_submit_internal!
    @ ~/.julia/packages/Dagger/xGAvM/src/submission.jl:9
  [4] eager_submit_internal!
    @ ~/.julia/packages/Dagger/xGAvM/src/submission.jl:7
  [5] #invokelatest#2
    @ ./essentials.jl:819
  [6] invokelatest
    @ ./essentials.jl:816
  [7] #110
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
  [8] run_work_thunk
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
  [9] macro expansion
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
 [10] #109
    @ ./task.jl:514
Stacktrace:
  [1] #remotecall_fetch#159
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:465
  [2] remotecall_fetch
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:454
  [3] #remotecall_fetch#162
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
  [4] remotecall_fetch
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492
  [5] eager_submit!
    @ ~/.julia/packages/Dagger/xGAvM/src/submission.jl:124
  [6] eager_launch!
    @ ~/.julia/packages/Dagger/xGAvM/src/submission.jl:192
  [7] enqueue!
    @ ~/.julia/packages/Dagger/xGAvM/src/queue.jl:12 [inlined]
  [8] #spawn#86
    @ ~/.julia/packages/Dagger/xGAvM/src/thunk.jl:304
  [9] spawn
    @ ~/.julia/packages/Dagger/xGAvM/src/thunk.jl:268 [inlined]
 [10] #DTable#3
    @ ~/.julia/packages/DTables/bA4g3/src/table/dtable.jl:143
 [11] DTable
    @ ~/.julia/packages/DTables/bA4g3/src/table/dtable.jl:142 [inlined]
 [12] JuliaParallel/DTables.jl#5
    @ ~/tmp/mwe.jl:13
 [13] #invokelatest#2
    @ ./essentials.jl:819
 [14] invokelatest
    @ ./essentials.jl:816
 [15] #110
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
 [16] run_work_thunk
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
 [17] macro expansion
    @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
 [18] #109
    @ ./task.jl:514
Stacktrace:
 [1] remotecall_fetch(f::Function, w::Distributed.Worker, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:465
 [2] remotecall_fetch(f::Function, w::Distributed.Worker, args::String)
   @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:454
 [3] #remotecall_fetch#162
   @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
 [4] remotecall_fetch(f::Function, id::Int64, args::String)
   @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492
 [5] top-level scope
   @ ~/tmp/mwe.jl:12
 [6] include(fname::String)
   @ Base.MainInclude ./client.jl:478
 [7] top-level scope
   @ REPL[1]:1
in expression starting at /home/steven/tmp/mwe.jl:12

(tmp) pkg> st
Status `~/tmp/Project.toml`
  [336ed68f] CSV v0.10.11
  [20c56dc6] DTables v0.4.1
  [a93c6f00] DataFrames v1.6.1
  [8bb1440f] DelimitedFiles v1.9.1
  [8ba89e20] Distributed

The code works as expected if I uncomment the commented-out line of code, i.e., if I load data on worker 1 first (it doesn't even have to be the same file that is loaded on worker 2).

Please let me know if I am missing anything or if there is something I'm doing wrong.

jpsamaroo commented 1 year ago

Good find! The recently-overhauled task submission logic had a bug in scheduler initialization, which #432 should both fix it and also adds an assertion to keep this from happening again.

StevenWhitaker commented 1 year ago

Cool, thanks for the quick fix!