node termination during node provisioning is not well handled, resulting in `connect: connection refused (ECONNREFUSED) in connect_to_worker from the new worker to the terminating worker.
julia> p = addprocs(2)
julia> begin # try this a couple times
@spawnat p[1] sleep(5)
@show rmprocs(p[1]; waitfor=0)
@show workers()
@show p = addprocs(1)
end
rmprocs(p[1]; waitfor=0) = :ok
workers() = [3,4,5]
ERROR: connect: connection refused (ECONNREFUSED)
in yieldto(::Task, ::ANY) at ./event.jl:153
in wait() at ./event.jl:186
in wait(::Condition) at ./event.jl:27
in stream_wait(::TCPSocket, ::Condition, ::Vararg{Condition,N}) at ./stream.jl:42
in wait_connected(::TCPSocket) at ./stream.jl:258
in connect at ./stream.jl:957 [inlined]
in connect_to_worker(::String, ::Int16) at ./managers.jl:490
in connect_w2w(::Int64, ::WorkerConfig) at ./managers.jl:453
in connect(::Base.DefaultClusterManager, ::Int64, ::WorkerConfig) at ./managers.jl:387
in connect_to_peer(::Base.DefaultClusterManager, ::Int64, ::WorkerConfig) at ./multi.jl:1516
in (::Base.##598#600{WorkerConfig,Int64})() at ./task.jl:404
Error [connect: connection refused (ECONNREFUSED)] on 6 while connecting to peer 4. Exiting.
Worker 6 terminated.
ERROR (unhandled task failure): Version read failed. Connection closed by peer.
node termination during node provisioning is not well handled, resulting in `connect: connection refused (ECONNREFUSED) in connect_to_worker from the new worker to the terminating worker.
for an example, see: https://travis-ci.org/JuliaLang/julia/jobs/186141590