Open jackie930 opened 5 years ago
what does the comment only one will be non-None mean?
If you look in train_part
you'll see the return value is
if xgb.rabit.get_rank() == 0: # Only return from one worker
result = bst
else:
result = None
so only the worker with rank 0 in XGBoost's world will return a non-None value.
what does the comment only one will be non-None mean?
If you look in
train_part
you'll see the return value isif xgb.rabit.get_rank() == 0: # Only return from one worker result = bst else: result = None
so only the worker with rank 0 in XGBoost's world will return a non-None value.
@TomAugspurger Thanks for your reply, I guess then my question is, here we split the data into several chunks, train various xgb models on each worker locally, and only return the result from one worker. Then does that mean we only actually do sample modeling instead of distributed training? In other words, what's the point of training the rank non-0 models on those workers?
I'm not very familiar with XGBoost's distributed mode, but my understanding is that it's actually distributed training.
FYI, dask-xgboost's functionality is being moved to XGBoost itself. https://github.com/dmlc/xgboost/pull/4473.
On Fri, May 31, 2019 at 1:52 AM jackie930 notifications@github.com wrote:
what does the comment only one will be non-None mean?
If you look in train_part you'll see the return value is
if xgb.rabit.get_rank() == 0: # Only return from one worker result = bst else: result = None
so only the worker with rank 0 in XGBoost's world will return a non-None value.
Thanks for your reply, I guess then my question is, here we split the data into several chunks, train various xgb models on each worker locally, and only return the result from one worker. Then does that mean we only actually do sample modeling instead of distributed training? In other words, what's the point of training the rank non-0 models on those workers?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/42?email_source=notifications&email_token=AAKAOIUMAV3TGVFSJ6Z3HRLPYDDMTA5CNFSM4HQARXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUL2LY#issuecomment-497597743, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIRCXKTBS7RSJN73ZPTPYDDMTANCNFSM4HQARXWA .
I think that all of the boosters are effectively the same, so we only need tor return one of them.
On Fri, May 31, 2019 at 7:08 AM Tom Augspurger notifications@github.com wrote:
I'm not very familiar with XGBoost's distributed mode, but my understanding is that it's actually distributed training.
FYI, dask-xgboost's functionality is being moved to XGBoost itself. https://github.com/dmlc/xgboost/pull/4473.
On Fri, May 31, 2019 at 1:52 AM jackie930 notifications@github.com wrote:
what does the comment only one will be non-None mean?
If you look in train_part you'll see the return value is
if xgb.rabit.get_rank() == 0: # Only return from one worker result = bst else: result = None
so only the worker with rank 0 in XGBoost's world will return a non-None value.
Thanks for your reply, I guess then my question is, here we split the data into several chunks, train various xgb models on each worker locally, and only return the result from one worker. Then does that mean we only actually do sample modeling instead of distributed training? In other words, what's the point of training the rank non-0 models on those workers?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/dask/dask-xgboost/issues/42?email_source=notifications&email_token=AAKAOIUMAV3TGVFSJ6Z3HRLPYDDMTA5CNFSM4HQARXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUL2LY#issuecomment-497597743 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAKAOIRCXKTBS7RSJN73ZPTPYDDMTANCNFSM4HQARXWA
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/42?email_source=notifications&email_token=AACKZTAYA6OM4VGFTFCIGDDPYEWOVA5CNFSM4HQARXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVKFFI#issuecomment-497722005, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTGF4RONON4TNZ2KWP3PYEWOVANCNFSM4HQARXWA .
Dear community,
I am quite new to dask, and trying to figure out how exactly does models like XGBoost distributed training via Dask. And got confused with below code from
dask-xgboost/dask_xgboost/core.py
, why here we only use[v for v in results if v][0]
instead of the whole result lists? i.e., what does the commentonly one will be non-None
mean?Thanks!
Jackie