Open vincentvdp opened 5 years ago
You're correct. This should match scikit-learn's behavior.
@TomAugspurger I want to work on this issue. Are there any inputs that you want to provide about this issue before I tackle this? Thanks
Nope, nothing beyond we should match scikit-learn.
@TomAugspurger as I go through the code I see that there is no implementation for multinomial logistic regression. Is someone working on it ? or do I need to create an issue and work on that?
Is there any update for this issue? I`m having the same problem between scikit-learn and dask-ml Logistic regression predict_proba function behaviour.
Still open, if you're interested in working on it.
IIUC, there are two issues:
(N,)
shape output to (N, 2)
.The first sounds relatively easier, so we might want to start with that. That would be around https://github.com/dask/dask-ml/blob/master/dask_ml/linear_model/glm.py#L248
I'm interested in working on this issue, and noticed there aren't any PRs associated with this currently (just for whats listed in the issue title, not implementing multinomial). I'll move forward unless I hear otherwise or see a PR!
Just like sklearn's precit_proba(X), Dask's documentation says it returns an "array-like, shape = [n_samples, n_classes]". However, it returns an "array-like, shape = [n_samples, 1]", for a binary classifier. E.g. run this:
Of course, no information is lost, but this does break stuff when used in other code. E.g. in GridSearchCV's scorer I get:
predict_proba(X)'s return value is indeed implemented differently in sklearn and dask-ml: sklearn:
dask-ml:
Is this intentional and would changing this break code that currently expects Dask to return (n,) and not (n,2)? Or is this an oversight, and should it return (n,2) as per documentation?