Closed NikEyX closed 5 years ago
I should note that in python I can use model.classes_
to get the class values. I guess my question boils down to how can I do this in your library (which seems awesome btw)?
@NikEyX , thanks for your interest to leaves
and sorry for late response.
Unfortunately, you can't obtain this information from xgboost's binary model file, because there is no such information.. Let me explain this by details:
XGBClassifier.fit
in python it performs labels encoding for y
, let's say labels 1242, 1152, 1552, 1242
goes to 0, 1, 2, 0
by using sklearn's LabelEncoder
. Then only labels like 0, 1, 2, ..
goes to xgboost core library, and model obtained can operate only these labels.. XGBClassifier.save/load_model
that points on that also:
The model is saved in an XGBoost internal binary format which is
universal among the various XGBoost interfaces. Auxiliary attributes of
the Python Booster object (such as feature names) will not be loaded.
Label encodings (text labels to numeric labels) will be also lost.
**If you are using only the Python interface, we recommend pickling the
model object for best results.
So, python xgboost bindings will be also lost original class labels after save_mode -> load_model operations.
btw, util.SigmoidFloat64SliceInplace
is not what you want to use in class of multi class classification. In that case you would use softmax transformation on raw tress values in order to obtain probabilities of classes occurrences. Sum of all class probabilities should be 1.0 (this is a property of softmax function).
Currently I'm developing an update for leaves
that make it possible apply transformation on tree results (sigmoid for binary classification, softmax for multi class classification, lambda rank for rank problems and so on). Stay tuned!
good to know, thanks for the updates! Love your work, keep it up!
Hi,
I'm not sure I fully understand the output of the Predict() methods.
I have a fully trained model with 9 classes and 100 estimators. I then run:
That yields:
I understand those are the probabilities for EACH of the 9 classes being the right one. However, how am I able to get the actual value of the class? In python if I do
y_pred = model.predict(values)
, it will correctly show me the expected class values. E.g. my class values look like this: 1242, 1152, 1552, 6662, etc. How can I map the prediction output from above to the class values? I haven't provided any specific order of it to the model