dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Run the central Rabit process on a worker #41

Open mrocklin opened 5 years ago

mrocklin commented 5 years ago

Currently we run Rabit's central process on the scheduler and the worker processes with the dask workers. This has caused issues in two cases:

  1. Sometimes the scheduler has a more stripped down environment and doesn't have all of the libraries that the workers do.
  2. Sometimes the scheduler's networking position is somewhat different from the workers #23 #40

We might consider instead running the tracker on a worker. This would also keep the scheduler more isolated. This is awkward if there is data on the worker where we want to run the tracker, but if we're comfortable moving data (as is the case in @RAMitchell 's rewrite) then maybe this doesn't matter.

@RAMitchell thought I'd bring this up now rather than later in case it affects things

javabrett commented 5 years ago
RAMitchell commented 5 years ago

So for my xgboost integration (https://github.com/dmlc/xgboost/pull/4473) I will try the approach of running the tracker on worker zero and assume the performance load of the tracker is negligible.