JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

Setup worker-worker connections lazily #42

Open amitmurthy opened 7 years ago

amitmurthy commented 7 years ago

The default all_to_all topology connects all processes to each other. While this is fine for small clusters, the total number of TCP connections increases rapidly as (N^2)/2.

Considering that a large class of parallel problems only need master-worker connections we should change the default topology to all_to_all_lazy where worker-worker connections are setup only on the first request from a worker to another worker. And also introduce another topology master_routed which only connects master to workers, and in case of a worker-worker call, routes the request through the master.

To summarize, implement 2 new topologies:

1) all_to_all_lazy where worker-worker connections are setup lazily, and is the default for addprocs and

2) master_routed in which only the master connects to workers and worker-worker messages are routed via the master.

ViralBShah commented 7 years ago

This would solve major connection time issues on large clusters that we have repeatedly seen.

andreasnoack commented 7 years ago

Just wanted mention that it also seemed that https://github.com/JuliaLang/julia/pull/22588 made adding remote workers noticeably faster.

amitmurthy commented 7 years ago

I wonder how and why JuliaLang/julia#22588 affected worker startup time. @vtjnash ?

amitmurthy commented 7 years ago

@andreasnoack / @ViralBShah care to comment on the interface for lazy connection setup in JuliaLang/julia#22814?

andreasnoack commented 7 years ago

Sorry for the noise here. Just did some more systematic timings and my previous impression must have been based on differences in the connection.

StefanKarpinski commented 6 years ago

Bump – are we still planning on doing this?

bisraelsen commented 6 years ago

bump