dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Migrate to XGBoost mainline repository #39

Closed mrocklin closed 3 years ago

mrocklin commented 5 years ago

@RAMitchell (an xgboost maintainer) was mentioning that it might be possible to migrate the whole of the dask-xgboost codebase into xgboost itself.

Any thoughts or concerns about this?

I think that the proposed API change would be to add a dask_client= keyword (or something similar) to the official train and predict methods.

TomAugspurger commented 5 years ago

No objections here, though I think the CI is failing with the latest xgboost. I have a half-written tutorial copying https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html. I’ll try to finish that up in a couple weeks.

javabrett commented 5 years ago

Someone should verify that this code with license (BSD 3-clause) can be absorbed into and under their license (ASF 2.0) without changes.

I expect this is fine as BSD-3-clause is permissive, but best make sure they have checked before they accept the code, that no relicensing is required.

jakirkham commented 5 years ago

As xgboost now has native Dask support in mainline ( https://github.com/dmlc/xgboost/pull/4473 ), what would like to do with this repo?

TomAugspurger commented 5 years ago

At least update the readme, possible add a warning on import.

On May 29, 2019, at 11:22, jakirkham notifications@github.com wrote:

As xgboost now has native Dask support in mainline ( dmlc/xgboost#4473 ), what would like to do with this repo?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

TomAugspurger commented 5 years ago

Will want to wait until there’s an XGBoost release though.

On May 29, 2019, at 11:22, jakirkham notifications@github.com wrote:

As xgboost now has native Dask support in mainline ( dmlc/xgboost#4473 ), what would like to do with this repo?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jakirkham commented 5 years ago

That sounds reasonable. How do we handle new issues that are reported here?

mrocklin commented 5 years ago

Just to be clear, the solution in XGBoost is quite a bit different than what is implemented in this repository. Folks may want to familiarize themselves with their solution before deprecating this repository entirely.

On Fri, May 31, 2019 at 9:47 AM jakirkham notifications@github.com wrote:

That sounds reasonable. How do we handle new issues that are reported here?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/39?email_source=notifications&email_token=AACKZTB2OAIN5DZCBFXTDLTPYFJCPA5CNFSM4HJ7HKPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVYQ5Q#issuecomment-497780854, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTAEGDR6ZNEDJRX7UFTPYFJCPANCNFSM4HJ7HKPA .

ksangeek commented 5 years ago

@mrocklin Could you please elaborate a bit on how the implementation in this repository is different from the one included in dmlc/xgboost? And if you have some more time on what circumstances this one could be preferred?

mrocklin commented 5 years ago

I don't know the other implementation particularly well. I recommend reading their docs to get a more authoritative perspective.

https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.dask

On Tue, Jun 18, 2019 at 5:39 PM ksangeek notifications@github.com wrote:

@mrocklin https://github.com/mrocklin Could you please elaborate a bit on how the implementation in this repository is different from the one included in dmlc/xgboost? And if you have some more time on what circumstances this one could be preferred?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/39?email_source=notifications&email_token=AACKZTGHKLYOJYJCFTNHSL3P3D6RJA5CNFSM4HJ7HKPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX7BT6Y#issuecomment-503192059, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTBTMTY4J6YR4ETU6C3P3D6RJANCNFSM4HJ7HKPA .

RAMitchell commented 5 years ago

@ksangeek See demos here: https://github.com/dmlc/xgboost/tree/master/demo/dask

The xgboost dask integration is currently slightly more low-level than what is here. We are currently considering if we can extend the API to provide more high level functionality currently available in dask-xgboost without duplicating existing xgboost APIs. If this can be achieved we should be able to definitively move users from dask-xgboost to native xgboost integration and deprecate this without loss of user experience.

ksangeek commented 5 years ago

Thank you @mrocklin and @RAMitchell for the useful information.

kylejn27 commented 4 years ago

Has this been thought about further? There's functionality in this library that I'd like to implement that's already been done in dmlc/xgboost's native dask integration

TomAugspurger commented 4 years ago

From what I understand, the implementation in xgboost is closer to what we have here now.

Personally, I'd like to see this handled within xgboost itself, to reduce the maintenance load here :)

Are there any features within dask-xgboost not handled within xgboost itself?

kylejn27 commented 4 years ago

the implementation in xgboost is closer to what we have here now.

I've been bouncing around the two codebases in the past few weeks, their implementation is extremely similar to what's in this repo, but supports more features in the underlying xgboost library

Are there any features within dask-xgboost not handled within xgboost itself

Nope. The peers that I work with are requesting features for this library that have already been implemented in the native xgboost dask implementation

TomAugspurger commented 4 years ago

This topic came up in the monthly Dask meeting today. On the call @kylejn27 repeated that dmlc/xgboost handles everything dask-xgboost handles and more. If that is the case then maintaining dask-xgboost is wasted effort.

XGBoost hasn't made a release since the refactored dask handling went in. So we'll need to wait until their next release before making any concrete decisions. But my preference is to gather feedback from users (here and from followers of the dask_dev Twitter account) on whether the implementation in dmlc/xgboost suffices after the next release. If so, then we'll archive this repository and direct people there.

hcho3 commented 4 years ago

FYI, I am currently working on the upcoming release for XGBoost (1.0.0). ETA is tomorrow (2/19).

jakirkham commented 4 years ago

@mmccarty @kylejn27 @gforsyth, would it be possible to get a summary of the current outstanding issues for migrating to xgboost for Dask usage?

cc @JohnZed (for vis)

jakirkham commented 4 years ago

On a different note, there is currently an RC for xgboost 1.2.0. Would be great if people can try and share feedback here ( https://github.com/dmlc/xgboost/issues/5970 ).

trivialfis commented 4 years ago

Please loop me in to related issues around migration.

mmccarty commented 4 years ago

This are the issues that I'm aware of

jameslamb commented 3 years ago

Now that release 1.3.0 is out (congrats!!!) and it includes fixes for all the issues in https://github.com/dask/dask-xgboost/issues/39#issuecomment-681974273, can this be closed?

jakirkham commented 3 years ago

Now that release 1.3.0 is out (congrats!!!) and it includes fixes for all the issues in #39 (comment), can this be closed?

@mmccarty? 🙂

mmccarty commented 3 years ago

Yes! Let's close this issue. Thank you everyone for the hard work!