Closed pseudotensor closed 6 years ago
@hcho3 it looks like the hist updater cannot handle the corner case when subsampling picks zero rows. Minimal reprex:
import xgboost as xgb
X = [[ 1.6, 0.2 , 4.8],[ 5.1, 1.6 ,6. ], [ 5.8, 2.2, 6.5]]
y = [ 3.4, 2.7 , 3.]
params = {'subsample': 0.2, 'max_depth': 2, 'n_estimators': 100, 'tree_method': 'hist',
'objective': 'reg:linear', 'random_state': 123, 'silent': 0, 'debug_verbose': 2}
model = xgb.XGBRegressor(**params)
model.fit(X,y)
Let me take a look at it and get back to you. Thanks!
Yes, it should be an easy fix and should behave like gpu_hist.
@pseudotensor I've submitted a small PR to handle the edge case. See #2817.
Looks good. Not how handled in the gpu algorithms, but hopefully sufficient.
I believe the GPU updater uses a different method to keep track of instance sets (i.e. which row belongs to which node).
I just zeroed the gradients for 'not selected' rows to achieve subsampling in the current gpu algos. I will probably do it differently for the next version.
Any ideas? @khotilov @hcho3 ?
This is using current head of master.
If I change sample choices to 1.0, then no failure.
Or if I choose gpu_hist instead, then no failure.
Otherwise, this gives the failure mode of below. So, hist logic in the sampling is flawed somehow.