dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.17k stars 8.71k forks source link

Obscure error while using refresh updater #5480

Open meakbiyik opened 4 years ago

meakbiyik commented 4 years ago

Hi, I have a xgboost model that I need to slightly fine tune for new datasets that it encounters, so I want to use the refresh updater (in python, via XGBClassifier object). However, I encounter an error that I do not understand:

XGBoostError                              Traceback (most recent call last)
<ipython-input-11-71e9384d395c> in <module>()
     35 experimental_explainer = shap.TreeExplainer(model_experimental)
     36 
---> 37 display(plot_contention(experiment_pred_data_con, experimental_explainer))
     38 running = False

<ipython-input-2-8e41c471912e> in plot_contention(df_to_plot, explainer)
    231     # Normalize the data and drop some columns to calculate the shap values
    232     tmp_df_shap = normalize_data(df_to_plot).drop(['timestamp', 'sid', 'Contention'], axis=1)
--> 233     shap_values = explainer.shap_values(tmp_df_shap)
    234 
    235     # Dropdown menus for the plot to select the features and the System ID

~/.conda/envs/smfpy_dev/lib/python3.7/site-packages/shap/explainers/tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity)
    213                     phi = self.model.original_model.predict(
    214                         X, ntree_limit=tree_limit, pred_contribs=True,
--> 215                         approx_contribs=approximate, validate_features=False
    216                     )
    217                 except ValueError as e:

~/.conda/envs/smfpy_dev/lib/python3.7/site-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features)
   1293                                           ctypes.c_uint(ntree_limit),
   1294                                           ctypes.byref(length),
-> 1295                                           ctypes.byref(preds)))
   1296         preds = ctypes2numpy(preds, length.value, np.float32)
   1297         if pred_leaf:

~/.conda/envs/smfpy_dev/lib/python3.7/site-packages/xgboost/core.py in _check_call(ret)
    176     """
    177     if ret != 0:
--> 178         raise XGBoostError(_LIB.XGBGetLastError())
    179 
    180 

XGBoostError: b'\xd508:53:28\xe5 src/tree/tree_model.cc:300: Check failed: unique_path\xd5i\xe5.pweight == 0 (NaNQ(1) vs. 0) Unique path 4 must have zero weight\xf5\xf3z\xf2\xf8\xbd@\xa2\x99\x83a\xa3'

Here's the code:

model_experimental = XGBClassifier(**{'colsample_bytree': 0.3, 'gamma': 0.0, 'max_depth': 8, 'min_child_weight': 7})
model_experimental.set_params(process_type = 'update', updater = 'refresh', refresh_leaf= True)
model_experimental.fit(experiment_x_pred_data_norm, contention_data['Contention'].values.ravel(), xgb_model = model.get_booster())

There might be several layers that instigated the problem, but I cannot start unraveling them until I understand what that means to be honest. First layer is that I am working in z/OS (IBM Mainframe), so the encoding is EBCDIC (I imagine that's the reason for byte encoding problems in the error itself).

xgboost version: 0.82

Sorry for the obscurity and thanks a lot in advance!

RAMitchell commented 4 years ago

Please provide a fully reproducible example using the latest xgboost version. Some things may have changed.

meakbiyik commented 4 years ago

@RAMitchell Sorry about it, I tried to reproduce the error with the same configuration (but a different dataset) on Windows but I failed. Therefore I imagine it's either a problem with the dataset, or the xgboost in the system. However, I am unable to use the latest xgboost version in z/OS because I do not manage the package, and it is a real hassle to compile xgboost for mainframe.

Can you explain to me why that error is raised under normal circumstances, so I can start debugging it or may forward it to people who can debug it? I can provide the updates here afterwards.

NEW INFO: Okay I tried the same thing on z/OS with that other dataset (iris basically) and it worked. So I imagine it is a dataset issue, and has nothing to do with the platform itself. The main problem persists though: I have no idea what that error means, therefore I am unable to debug it (I unfortunately cannot provide you the dataset either).

trivialfis commented 4 years ago

@meakbiyik The error the weight is being zero. Could you plot the tree? If you can upgrade XGBoost to 1.0.2, saving the model to JSON is also very helpful. I can try to reproduce a similar model.

meakbiyik commented 4 years ago

@trivialfis I plotted the tree with ordinal number 0 as below:

tree

And here is the json dump of the model (there was a dump_model() function in 0.82 so I used it):

tree.zip

I hope these may help you resolve the issue.

meakbiyik commented 4 years ago

I realized that I did something slightly illogical, sent you only the base model but not the failed output (basically the state of the model before outputting the error). Tree for that makes more sense:

image

and the json is here:

tree_with_error.zip