Closed affans closed 5 years ago
I am getting the following errors on STDOUT from the workers when using Slurm:
==> job0000.out <== MethodError(convert, (Tuple, :all_to_all), 0x0000000000005549)CapturedException(MethodError(convert, (Tuple, :all_to_all), 0x0000000000005549), Any[(setindex!(::Array{Tuple,1}, ::Symbol, ::Int64) at array.jl:583, 1), ((::Base.Distributed.##99#100{TCPSocket,TCPSocket,Bool})() at event.jl:73, 1)]) Process(1) - Unknown remote, closing connection. Master process (id 1) could not connect within 60.0 seconds. exiting. ==> job0001.out <== TypeError(:deserialize_module, "typeassert", Module, ===)CapturedException(TypeError(:deserialize_module, "typeassert", Module, ===), Any[((::Base.Distributed.##99#100{TCPSocket,TCPSocket,Bool})() at event.jl:73, 1)]) Process(1) - Unknown remote, closing connection. Master process (id 1) could not connect within 60.0 seconds. exiting.
I can not make sense of the error and where its originating.
related discourse topic: https://discourse.julialang.org/t/there-is-a-bug-in-this-function-and-i-cant-figure-out-what-it-is/19150/4
@vchuravy your help would be greatly appreciated.
This is not a ClusterManagers issue, and so can be closed. The related issue is posted on https://github.com/JuliaLang/julia/issues/30558
ClusterManagers
I am getting the following errors on STDOUT from the workers when using Slurm:
I can not make sense of the error and where its originating.
related discourse topic: https://discourse.julialang.org/t/there-is-a-bug-in-this-function-and-i-cant-figure-out-what-it-is/19150/4
@vchuravy your help would be greatly appreciated.