dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
901 stars 256 forks source link

Expand our metrics #213

Open TomAugspurger opened 6 years ago

TomAugspurger commented 6 years ago

We have the basics, without weights.

http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics

Not all of these will be easily doable.

stsievert commented 6 years ago

Right now we only have support for about 7 metrics in https://github.com/dask/dask-ml/blob/master/dask_ml/metrics/__init__.py

Not all of these will be easily doable.

Can you expand upon this?

TomAugspurger commented 6 years ago

Can you expand upon this?

Just a suspicion that I haven't verified. e.g. https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/metrics/ranking.py#L39 is doing some things with ordering (maybe just checks) that looks hard to do in parallel.

atyamsriharsha commented 5 years ago

@TomAugspurger I want to work on this. Any inputs?

TomAugspurger commented 5 years ago

Thanks.

I think a valuable mini-project would be to see how many of scikit-learn's metrics will work properly on dask arrays with NEP-18: https://www.numpy.org/neps/nep-0018-array-function-protocol.html. I think that dask master implements the protocol now.

If possible, I would prefer to deprecate dask-ml's metrics, and get various projects (dask-ml, cuML) using the same implementation (scikit-learn's presumably).

On Wed, Apr 10, 2019 at 6:20 PM Sriharsha Atyam notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger I want to work on this. Any inputs?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/213#issuecomment-481903022, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIgBwHm1qiyvQriKzN-mCBZLGSRFOks5vfnHAgaJpZM4UoMNp .

TomAugspurger commented 5 years ago

https://mail.python.org/pipermail/scikit-learn/2019-May/003118.html asks about incremental cluster scoring.

more generally, https://scikit-learn.org/stable/modules/classes.html#clustering-metrics

TomAugspurger commented 5 years ago

I think a valuable mini-project would be to see how many of scikit-learn's metrics will work properly on dask arrays with NEP-18: https://www.numpy.org/neps/nep-0018-array-function-protocol.html.

To clarify: I don't think this should preclude implementing the metrics here in dask-ml first. Getting metrics to work well with a variety of arrays is a big project that will take a while.

DuanBoomer commented 1 year ago

Tags: @TomAugspurger @stsievert Hello, can I work on the issue titled "Expand our metrics #213".