Open lopuhin opened 7 years ago
It may be even more subtle. For random forests individual trees are fit on a subset of features, and we should take in account only features which are present in these subsets, even if they have zero feature importance.
On the other hand, at prediction time it doesn't matter if feature had zero importance or not, or if it was in a subset or not, so if we're looking at std deviation from "prediction" point of view it could make sense to keep it as-is.
Probably zero feature importances should not be considered when calculating std for random forest feature importances