bethatkinson / rpart

Recursive Partitioning and Regression Trees
46 stars 24 forks source link

why there is “surrogate” when no missing value? #38

Open A-Pai opened 2 years ago

A-Pai commented 2 years ago

library(mice) library(rpart)

md.pattern(iris) #Missing value survey mytree <- rpart(Species ~ ., data = iris) summary(mytree) rattle::fancyRpartPlot(mytree)

why there is “surrogate” when no missing value? image image image

bgreenwell commented 2 years ago

Surrogate splits are always on by default, even when your training data are complete (you can turn them off via rpart.control if needed). The purpose here is so that you can still obtain predictions on new data that might potentially have missing values.