JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
20 stars 8 forks source link

Distributed worker manager doesn't use socket connection to infer worker ip #85

Open Moelf opened 1 year ago

Moelf commented 1 year ago

https://github.com/JuliaLang/julia/blob/0d00660a38f4d4049e12a97399e4ef613bf0d7dc/stdlib/Distributed/src/managers.jl#L568

for some reason we don't use the fact that we can call Sockets.getpeername() here, instead we read the stdout of the worker process.

This is problemmatic mainly because:

  1. the worker nodes always report the first IPv4 interface's address no matter if that's actually the interface it used to contact main node: https://github.com/JuliaLang/julia/blob/0d00660a38f4d4049e12a97399e4ef613bf0d7dc/stdlib/Sockets/src/addrinfo.jl#L272-L276

  2. the worker node may be running inside container (or whatever reason has virtual interface before everything else)

my questions: can we add a specialization for read_worker_host_port when config.io :: Sockets.TCPSocket?

Moelf commented 1 year ago
bash-4.2$ route | grep '^default' | grep -o '[^ ]*$'
ens1f0.3604

shows that we should be using:

192.170.240.0

but the first IP address libuv came up with is 192.168.240.0;

I couldn't find how to look for the default interface in libuv

Moelf commented 1 year ago

either we detect the default interface, or basically we need something like: https://github.com/JuliaWeb/IPNets.jl/blob/92a9364b4f12b4762ecfa3d6d233ab27aee6c5c4/src/IPNets.jl#L217

at this location: https://github.com/JuliaLang/julia/blob/0d00660a38f4d4049e12a97399e4ef613bf0d7dc/stdlib/Sockets/src/addrinfo.jl#L273

to filter out private IP range