gameofdimension / xgboost_explainer

A tool for analyzing feature importance of xgboost model. idea came from R version xgboostExplainer
42 stars 20 forks source link

Unexpected gain on `check_params` #1

Open jjbrophy47 opened 6 years ago

jjbrophy47 commented 6 years ago

Hello, I like this package a lot and i've been trying to use it with my own models lately, but I have been encountering this assertion: assert abs(expect_gain - parent['gain']) < 1.e-2 a number of times.

I have been training an xgboost model with 1000 trees, and this assertion gets thrown on about a dozen of those trees on average. I was just wondering if it's something I am doing, or if it's a problem with the model that XGBoost dumps out in text form not always being correct.

Thanks for any input!

gameofdimension commented 6 years ago

this function ensure you passed the params (lambda and eta) exactly as you trained the xgboost model, if you are sure about this you can safely comment out this line. for the specific error, i can give nothing useful without details.

hanooka commented 6 years ago

Here's 2 errors I get for no reason: (using on private data set which is solid. I can train an xgboost model on it)

line 56, in model2table k,v = p.split('=') ValueError: not enough values to unpack (expected 2, got 1)

Process finished with exit code 1

line 74, in model2table node_lst[node_idx] = d IndexError: list assignment index out of range

Process finished with exit code 1

2g-XzenG commented 4 years ago

Hi @hanooka, I encounter similar errors, which is caused by the white spaces within my feature names, like "feature 1". Changing the feature name to "feature_1" fix the error. Hope it helps, Thanks

charlie9526 commented 4 years ago

Hi @all 1) I am trying this explainer on XGB regressor. The problem is that, the sum of the contribution is not same as the prediction of the XGB regressor model. There is gap of 0.5 between those two values. 2) In the medium article of the R package of this library has explained that the contribution is log odds. But in regression the contribution is a continuous value which the collection is same to the prediction-0.5. Here also in the medium article, the graph shows the 0.5 line, but not mentioned in the calculations about that 0.5.