When you attempt to use dxgb.XGBClassifier's predict method, it always generates a prediction of 1 regardless of the predict_proba (sigmoid) output. See minimal motivating example below, where I generate targets of all 0. The model learns it should generally predict 0 (low probabilities), but the predictions all generate 1.
Note: you cannot pass a threshold parameter into .predict(), another notable gap.
import dask_xgboost as dxgb
from dask.distributed import Client
import dask.array as da
import numpy as np
client = Client()
X = np.random.randint(1,5,(10,2))
y = np.zeros(10)
X = da.from_array(X)
y = da.from_array(y)
model = dxgb.XGBClassifier(n_estimator=5)
model.fit(X, y)
sigmoids = model.predict_proba(X).compute()
preds = model.predict(X).compute()
print(sigmoids, preds)
Output:
(First list is sigmoids, second list is predictions)
Where any generated single dimensional class probability is evaluated as a 1. It's an easy fix, all you have to do is pass in a threshold parameter that allows you to set that 0 to some float and default that value to 0.5.
When you attempt to use dxgb.XGBClassifier's predict method, it always generates a prediction of 1 regardless of the
predict_proba
(sigmoid) output. See minimal motivating example below, where I generate targets of all 0. The model learns it should generally predict 0 (low probabilities), but the predictions all generate 1.Note: you cannot pass a threshold parameter into
.predict()
, another notable gap.Output: (First list is sigmoids, second list is predictions)
It stems from line 537 of
core.py
Where any generated single dimensional class probability is evaluated as a 1. It's an easy fix, all you have to do is pass in a
threshold
parameter that allows you to set that0
to some float and default that value to 0.5.