change predict threshold from 0 to 0.5

dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License

162 stars 43 forks source link

change predict threshold from 0 to 0.5 #65

Closed kylejn27 closed 4 years ago

kylejn27 commented 4 years ago

~Adds a threshold value to the predict function with a default of 0.5~

Update predict threshold to > 0.5
Update test_classifier to use a more complex dataset. Test fails on original threshold of >0, passes on change in this PR.

TomAugspurger commented 4 years ago

I haven't looked closely #62 yet, but I don't think a threshold keyword in predict is the right way to do this. Typically users would use predict_proba and some kind of probability calibrating meta-estimator.

What is the domain on the class_probs returned by predict(client, self._Booster, X). It seems like the comparison to 0 is the problem? Most likely predict_proba should always return an (n_samples, n_classes) array, and we take the argmax over that?

kylejn27 commented 4 years ago

Hey! Just getting back to this.. it's been a while 😂

You're right, a threshold parameter isn't really the right way to do this.

Looks like the base xgboost package has a hard-coded 0.5 cutoff. I think that it's probably best to update the predict threshold from 0 to 0.5

https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py#L900-L904

TomAugspurger commented 4 years ago

Thanks!