Closed kylejn27 closed 4 years ago
I haven't looked closely #62 yet, but I don't think a threshold keyword in predict
is the right way to do this. Typically users would use predict_proba
and some kind of probability calibrating meta-estimator.
What is the domain on the class_probs
returned by predict(client, self._Booster, X)
. It seems like the comparison to 0 is the problem? Most likely predict_proba
should always return an (n_samples, n_classes)
array, and we take the argmax over that?
Hey! Just getting back to this.. it's been a while 😂
You're right, a threshold parameter isn't really the right way to do this.
Looks like the base xgboost package has a hard-coded 0.5
cutoff. I think that it's probably best to update the predict threshold from 0 to 0.5
https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py#L900-L904
Thanks!
~Adds a threshold value to the predict function with a default of 0.5~
> 0.5
test_classifier
to use a more complex dataset. Test fails on original threshold of>0
, passes on change in this PR.Related issue: https://github.com/dask/dask-xgboost/issues/62