JuliaParallel / Dagger.jl

A framework for out-of-core and parallel execution
Other
621 stars 67 forks source link

Occasional `UndefRefError: access to undefined reference` when `fetch`ing `DTable`s #453

Closed StevenWhitaker closed 9 months ago

StevenWhitaker commented 9 months ago

I don't have a reproducer for this one, but here's an error that pops up occasionally:

      From worker 2:    │   ex =
      From worker 2:    │    TaskFailedException
      From worker 2:    │
      From worker 2:    │        nested task error: ThunkFailedException:
      From worker 2:    │          Root Exception Type: CapturedException
      From worker 2:    │          Root Exception:
      From worker 2:    │        UndefRefError: access to undefined reference
      From worker 2:    │        Stacktrace:
      From worker 2:    │         [1] getindex
      From worker 2:    │           @ ./essentials.jl:13 [inlined]
      From worker 2:    │         [2] get!
      From worker 2:    │           @ ./dict.jl:465
      From worker 2:    │         [3] OSProc (repeats 2 times)
      From worker 2:    │           @ ~/.julia/packages/Dagger/M13n0/src/processor.jl:109 [inlined]
      From worker 2:    │         [4] do_task
      From worker 2:    │           @ ~/.julia/packages/Dagger/M13n0/src/sch/Sch.jl:1368
      From worker 2:    │         [5] macro expansion
      From worker 2:    │           @ ~/.julia/packages/Dagger/M13n0/src/sch/Sch.jl:1243 [inlined]
      From worker 2:    │         [6] #132
      From worker 2:    │           @ ./task.jl:134
      From worker 2:    │          Root Thunk:  Thunk(id=3, _file_load(file.csv, #11, DataFrames.DataFrame))
      From worker 2:    │          Inner Thunk: Thunk(id=5, isnonempty(Thunk[3](_file_load, Any["file.csv", MyPkg.var"#11#12"(), DataFrames.DataFrame])))
      From worker 2:    │          This Thunk:  Thunk(id=5, isnonempty(Thunk[3](_file_load, Any["file.csv", MyPkg.var"#11#12"(), DataFrames.DataFrame])))
      From worker 2:    │        Stacktrace:
      From worker 2:    │          [1] fetch(t::Dagger.ThunkFuture; proc::Dagger.OSProc, raw::Bool)
      From worker 2:    │            @ Dagger ~/.julia/packages/Dagger/M13n0/src/eager_thunk.jl:16
      From worker 2:    │          [2] fetch
      From worker 2:    │            @ ~/.julia/packages/Dagger/M13n0/src/eager_thunk.jl:11 [inlined]
      From worker 2:    │          [3] #fetch#75
      From worker 2:    │            @ ~/.julia/packages/Dagger/M13n0/src/eager_thunk.jl:58 [inlined]
      From worker 2:    │          [4] fetch
      From worker 2:    │            @ ~/.julia/packages/Dagger/M13n0/src/eager_thunk.jl:54 [inlined]
      From worker 2:    │          [5] #10
      From worker 2:    │            @ ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:233 [inlined]
      From worker 2:    │          [6] filter(f::DTables.var"#10#13"{Vector{Dagger.EagerThunk}}, a::Vector{Tuple{Int, Union{Dagger.EagerThunk, Dagger.Chunk}}})
      From worker 2:    │            @ Base ./array.jl:2610
      From worker 2:    │          [7] trim!(d::DTables.DTable)
      From worker 2:    │            @ DTables ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:233
      From worker 2:    │          [8] trim(d::DTables.DTable)
      From worker 2:    │            @ DTables ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:242
      From worker 2:    │          [9] retrieve_partitions
      From worker 2:    │            @ ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:179 [inlined]
      From worker 2:    │         [10] fetch(d::DTables.DTable)
      From worker 2:    │            @ DTables ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:167

I believe the MyPkg.var"#11#12" in the stacktrace is x -> CSV.File(x).

This is without out-of-core processing. I'm not sure what about the code I'm running now causes this, while the previous code I've run does not.

jpsamaroo commented 9 months ago

Ahh yeah, that OSPROC_PROCESSOR_CACHE dictionary needs a lock around it, will add.