dask / dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
BSD 3-Clause "New" or "Revised" License
240 stars 43 forks source link

Fold dask-searchcv into Dask-ML #73

Closed TomAugspurger closed 6 years ago

TomAugspurger commented 6 years ago

Currently dask-ml.model_selection just imports dask-searchcv. For ease of maintenance, I'd like to move dask-searchcv into dask-ml. Does anyone have objections?

cc @jcrist

jcrist commented 6 years ago

Nominally I'm ok with this. One nice thing about having them split is that dask-searchcv is quite minimal in complexity, scope, and dependencies (just dask (not distributed) and scikit-learn). dask-ml has larger scope, and larger dependencies. Depending on users this may or may not matter.

I consider the functionality in dask-searchcv "done" (minus any scikit-learn updates to the gridsearch/randomizedsearch class interfaces), so I don't think the comment of "stale" in the referring issue is valid, but I also am not actively working with these libraries and understand having a single repo may be more appealing.

A few paths forward (no strong feelings on any):

TomAugspurger commented 6 years ago

Thanks.

I share your opinion that dask-searchcv is done, as of scikit-learn 0.19.x.

My main motivations for merging with dask-ml is that

  1. dask-ml is set up to run against sklearn master
  2. dask-ml sees more activity these days, so we're running CI more often and will catch breaks sooner.
  3. It's less work to release one package instead of two :)

We could solve 1 and 2 by updating dask-searchcv's CI and running a cron job.

TomAugspurger commented 6 years ago

Coming back to this @jcrist,

Of your options, I prefer 3. We could include a note in the documentation and README that further development is happening in the dask-ml repo. I'm not really sure that a deprecation is even necessary, though I'm not sure what best practice is in this situation.

Is that OK by you?

jcrist commented 6 years ago

Fine by me. Do you mind if I do the move to try and maintain some level of attribution?

I'm not really sure that a deprecation is even necessary, though I'm not sure what best practice is in this situation.

I'll just add a comment in the readme saying this functionality has been moved to dask-ml.

mrocklin commented 6 years ago

It's also possible to spoof another author with git

On Thu, Jun 28, 2018 at 12:05 PM, Jim Crist notifications@github.com wrote:

Fine by me. Do you mind if I do the move to try and maintain some level of attribution?

I'm not really sure that a deprecation is even necessary, though I'm not sure what best practice is in this situation.

I'll just add a comment in the readme saying this functionality has been moved to dask-ml.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-searchcv/issues/73#issuecomment-401087409, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKla60R0XNaetvexppA8vXPQyhpuks5uBP7kgaJpZM4UkVDA .

TomAugspurger commented 6 years ago

try and maintain some level of attribution?

Yeah, absolutely. I'm happy to take care of the move as well.

It seems like git subtree add --prefix=dask-searchcv git://github.com/dask/dask-searchcv.git master will preserve the authors and history. We would just need to not squash the commit as it's being merged I think.

jcrist commented 6 years ago

I see we read the same stackoverflow link :). If you want to try and handle the merge in a way that some level of attribution is maintained, be my guest :).

TomAugspurger commented 6 years ago

And the log will look something like

*   0689203 - (HEAD -> master, dask-searchcv) Add 'dask-searchcv/' from commit '86d14bde23a4c0d5021eea461a3acb38045fe551' (22 seconds ago) <Tom Augspurger>
|\
| * 86d14bd - Support deprecation of return_train_scores in 0.19.1 (#68) (7 months ago) <Jim Crist>
| * 5bac3d8 - Test 0.19.1, squash a few warnings (#67) (7 months ago) <Jim Crist>
| * 37fb668 - Flake8 (#66) (7 months ago) <Jim Crist>
| * 260e1be - ENH: Support multiple metrics (#60) (7 months ago) <Tom Augspurger>
| * a2dd24e - Update dask (#63) (8 months ago) <Jim Crist>
| * 2c2056d - Support sklearn 0.19.0 (#55) (10 months ago) <Jim Crist>
| * 63e450d - Remove deprecated `sklearn.utils.fixes.rankdata` (#53) (10 months ago) <Brett Naul>

once merged

I see we read the same stackoverflow link :)

What, you don't think I knew that off the top of my head? :)

TomAugspurger commented 6 years ago

Oh neat, are these bits from dklearn?

| * 7bdf52c - Add support for `error_score` (#15) (1 year, 4 months ago) <Jim Crist>
| *   a331161 - Merge pull request #14 from jcrist/just-the-good-parts (1 year, 4 months ago) <Jim Crist>
| |\
| | * 4e68ccf - Update travis (1 year, 4 months ago) <Jim Crist>
| | * 467983f - Delete some files that snuck through (1 year, 4 months ago) <Jim Crist>
| | * e2dd1eb - Update setup.py (1 year, 4 months ago) <Jim Crist>
| | * 536cd96 - Update readme (1 year, 4 months ago) <Jim Crist>
| | * 116c867 - Add tests for model_selection (1 year, 4 months ago) <Jim Crist>
| | * 1b216ca - A few cleanups (1 year, 4 months ago) <Jim Crist>
| | * d1989c9 - Add flake8 setup (1 year, 4 months ago) <Jim Crist>
| | * f3fec27 - Add derived docstrings (1 year, 4 months ago) <Jim Crist>
| | * 77081e5 - Rewrite (1 year, 4 months ago) <Jim Crist>
| | * 6a7bddf - More rework (1 year, 4 months ago) <Jim Crist>
| | * ecd15bb - Add support for multiple runs (1 year, 4 months ago) <Jim Crist>
| | * 2b8e626 - Reorg (1 year, 4 months ago) <Jim Crist>
| | * 344f25c - Delete even more things (1 year, 4 months ago) <Jim Crist>
| | * 534c9b4 - Delete more things, simplify (1 year, 4 months ago) <Jim Crist>
| | * 6f252db - Delete unused functions (1 year, 4 months ago) <Jim Crist>
| | * e1e3b49 - Delete data-parallelism stuff (1 year, 4 months ago) <Jim Crist>
| |/
|
jcrist commented 6 years ago

Oh neat, are these bits from dklearn?

Yeah, it was the same repo.

TomAugspurger commented 6 years ago

https://github.com/dask/dask-ml/pull/241 is in.

I'll followup with doc updates here later today or tomorrow.