SauceCat / PDPbox

python partial dependence plot toolbox
http://pdpbox.readthedocs.io/en/latest/
MIT License
840 stars 129 forks source link

logic error with one-hot encoding feature #45

Closed gowestyang closed 3 years ago

gowestyang commented 5 years ago

In the example provided, 'embarked' has three labels 'C', 'S', and 'Q', and all 3 labels presented as features of the model. However, practically when train the model, encoding all labels to the model will cause loss rank. As the name "one-hot" indicates, the base label should not be used as feature to keep full rank. In this case, "Embarked_C" would not be a feature to train the model. So the PDP should display correct dependency values for feature=['Embarked_S', 'Embarked_Q']