Open AlbertDeFusco opened 4 years ago
I'll look into this soon. I'm planning a refactor to move this logic into distributed itself.
Hi, is there any update on this issue? I'm using the Dask implementation in XGBoost itself, rather than this library, so my feeling is this may be a bug with Dask rather than XGBoost?
I'm using LocalCluster with Dask 2.28.0.
dtrain = xgb.dask.DaskDMatrix(client, X, y)
output = xgb.dask.train(
client,
{
'verbosity': 2,
'tree_method': 'hist',
'objective': 'binary:logistic'
},
dtrain,
num_boost_round=4,
evals=[(dtrain, 'train')]
)
/root/anaconda3/lib/python3.7/site-packages/distributed/client.py:3530: RuntimeWarning: coroutine 'Client._update_scheduler_info' was never awaited
self.sync(self._update_scheduler_info)
When connecting to a dask-gateway the
client.scheduler_address
is a proxy addressI was able to solve this with the following in
core::_train
withclient.scheduler_info()['address']
)However, I get the following warning.
I have verified that his update works correctly on a 9m row training set and scales linearly from 4 to 8 workers (2cores/worker). Is this the correct approach to get the actual scheduler address?