question about source code logic

jackie930 commented 5 years ago

Dear community,

I am quite new to dask, and trying to figure out how exactly does models like XGBoost distributed training via Dask. And got confused with below code from dask-xgboost/dask_xgboost/core.py, why here we only use [v for v in results if v][0] instead of the whole result lists? i.e., what does the comment only one will be non-None mean?

# Get the results, only one will be non-None
    results = yield client._gather(futures)
    result = [v for v in results if v][0]
    num_class = params.get("num_class")
    if num_class:
        result.set_attr(num_class=str(num_class))
    raise gen.Return(result)

Thanks!

Jackie

TomAugspurger commented 5 years ago

what does the comment only one will be non-None mean?

If you look in train_part you'll see the return value is

        if xgb.rabit.get_rank() == 0:  # Only return from one worker
            result = bst
        else:
            result = None

so only the worker with rank 0 in XGBoost's world will return a non-None value.

jackie930 commented 5 years ago

what does the comment only one will be non-None mean?

If you look in train_part you'll see the return value is
        if xgb.rabit.get_rank() == 0:  # Only return from one worker
            result = bst
        else:
            result = None
so only the worker with rank 0 in XGBoost's world will return a non-None value.

@TomAugspurger Thanks for your reply, I guess then my question is, here we split the data into several chunks, train various xgb models on each worker locally, and only return the result from one worker. Then does that mean we only actually do sample modeling instead of distributed training? In other words, what's the point of training the rank non-0 models on those workers?

TomAugspurger commented 5 years ago

I'm not very familiar with XGBoost's distributed mode, but my understanding is that it's actually distributed training.

FYI, dask-xgboost's functionality is being moved to XGBoost itself. https://github.com/dmlc/xgboost/pull/4473.

On Fri, May 31, 2019 at 1:52 AM jackie930 notifications@github.com wrote:

what does the comment only one will be non-None mean?

If you look in train_part you'll see the return value is
    if xgb.rabit.get_rank() == 0:  # Only return from one worker
        result = bst
    else:
        result = None
so only the worker with rank 0 in XGBoost's world will return a non-None value.

Thanks for your reply, I guess then my question is, here we split the data into several chunks, train various xgb models on each worker locally, and only return the result from one worker. Then does that mean we only actually do sample modeling instead of distributed training? In other words, what's the point of training the rank non-0 models on those workers?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/42?email_source=notifications&email_token=AAKAOIUMAV3TGVFSJ6Z3HRLPYDDMTA5CNFSM4HQARXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUL2LY#issuecomment-497597743, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIRCXKTBS7RSJN73ZPTPYDDMTANCNFSM4HQARXWA .

mrocklin commented 5 years ago

I think that all of the boosters are effectively the same, so we only need tor return one of them.

On Fri, May 31, 2019 at 7:08 AM Tom Augspurger notifications@github.com wrote:

I'm not very familiar with XGBoost's distributed mode, but my understanding is that it's actually distributed training.

FYI, dask-xgboost's functionality is being moved to XGBoost itself. https://github.com/dmlc/xgboost/pull/4473.

On Fri, May 31, 2019 at 1:52 AM jackie930 notifications@github.com wrote:

what does the comment only one will be non-None mean?

If you look in train_part you'll see the return value is

if xgb.rabit.get_rank() == 0: # Only return from one worker result = bst else: result = None

so only the worker with rank 0 in XGBoost's world will return a non-None value.

Thanks for your reply, I guess then my question is, here we split the data into several chunks, train various xgb models on each worker locally, and only return the result from one worker. Then does that mean we only actually do sample modeling instead of distributed training? In other words, what's the point of training the rank non-0 models on those workers?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/dask/dask-xgboost/issues/42?email_source=notifications&email_token=AAKAOIUMAV3TGVFSJ6Z3HRLPYDDMTA5CNFSM4HQARXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUL2LY#issuecomment-497597743 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAKAOIRCXKTBS7RSJN73ZPTPYDDMTANCNFSM4HQARXWA

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/42?email_source=notifications&email_token=AACKZTAYA6OM4VGFTFCIGDDPYEWOVA5CNFSM4HQARXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVKFFI#issuecomment-497722005, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTGF4RONON4TNZ2KWP3PYEWOVANCNFSM4HQARXWA .

dask / dask-xgboost

question about source code logic #42