JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
20 stars 8 forks source link

broken error handling in message_handler_loop #91

Closed samtkaplan closed 4 months ago

samtkaplan commented 4 months ago

Problem introduced by https://github.com/JuliaLang/Distributed.jl/commit/fdf56f429cd44da6d5d08cb5208bbf41d5b3d0a5. The change means that we don't handle the case where wpid is invalid. In turn, this can result in an "Unhandled Task ERROR":

From worker 7:    Unhandled Task ERROR: no process with id 0 exists
From worker 7:    Stacktrace:
      From worker 7:     [1] error(s::String)
      From worker 7:       @ Base ./error.jl:35
      From worker 7:     [2] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1098
      From worker 7:     [3] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1090 [inlined]
      From worker 7:     [4] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:213
      From worker 7:     [5] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:133
      From worker 7:     [6] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:121
      From worker 7:
      From worker 7:    caused by: Process(1) - Invalid connection credentials sent by remote.
      From worker 7:    Stacktrace:
      From worker 7:     [1] error(s::String)
      From worker 7:       @ Base ./error.jl:35
      From worker 7:     [2] process_hdr(s::Sockets.TCPSocket, validate_cookie::Bool)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:265
      From worker 7:     [3] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:158
      From worker 7:     [4] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:133
      From worker 7:     [5] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
      From worker 7:       @ Distributed /opt/julia/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:121