Closed droodman closed 2 months ago
I think the issue is that Dagger is loaded before the workers are added, if you load it afterwards it works:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.0-beta1 (2024-04-10)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using Distributed
julia> addprocs(4)
4-element Vector{Int64}:
2
3
4
5
julia> using Dagger
julia> X = Dagger.@shard myid()
Dagger.Shard(Dict{Dagger.Processor, Dagger.Chunk}(OSProc(1) => Dagger.Chunk{Int64, MemPool.DRef, OSProc, ProcessScope}(Int64, UnitDomain(), MemPool.DRef(1, 16, 0x0000000000000008), OSProc(1), ProcessScope: worker == 1, false), OSProc(2) => Dagger.Chunk{Int64, MemPool.DRef, OSProc, ProcessScope}(Int64, UnitDomain(), MemPool.DRef(2, 0, 0x0000000000000008), OSProc(2), ProcessScope: worker == 2, false), OSProc(3) => Dagger.Chunk{Int64, MemPool.DRef, OSProc, ProcessScope}(Int64, UnitDomain(), MemPool.DRef(3, 0, 0x0000000000000008), OSProc(3), ProcessScope: worker == 3, false), OSProc(4) => Dagger.Chunk{Int64, MemPool.DRef, OSProc, ProcessScope}(Int64, UnitDomain(), MemPool.DRef(4, 0, 0x0000000000000008), OSProc(4), ProcessScope: worker == 4, false), OSProc(5) => Dagger.Chunk{Int64, MemPool.DRef, OSProc, ProcessScope}(Int64, UnitDomain(), MemPool.DRef(5, 0, 0x0000000000000008), OSProc(5), ProcessScope: worker == 5, false)))
AFAIK this is a limitation of Distributed.jl rather than Dagger itself. Where did you did see that example in the docs?
Ah, yes, that does fix it.
But I think it points up a gap in the documentation. The example is from the documentation in the sense that the line of interest, X = Dagger.@shard myid()
is on the quick start page. I wanted to try it in the Julia session, so I did what seemed the obvious thing to me. myid()
is in DIstributed, so I loaded that with using
. While I was at it, I loaded Dagger. Then I ran addprocs()
. Then I ran the command of interest. It crashed. I thought, oh I guess Dagger is not worth the trouble. More of a quick end than a quick start!
If it is is easy to get a crash when using Dagger then I think how to avoid that should be prominent on the quick start page. Put another way, there isn't a complete example on the quick start page that includes the using commands and whatever other setup is needed. Or is it possible for Dagger to detect the condition that causes the crash and provide a helpful message?
Yeah that's fair, I added some docs about it in #510.
Or is it possible for Dagger to detect the condition that causes the crash and provide a helpful message?
I don't think this is possible in Dagger itself, it would need to be added in Distributed. What's happening is that the master process is executing some code (like Dagger.@shard
) that serializes Dagger objects and sends them to the workers, but if the workers don't have Dagger loaded they see a name like Dagger
in the object type and cannot deserialize the object because they don't know anything about the Dagger
module.
Just copied an example from the documentation in a new Julia session...