bethatkinson / rpart

Recursive Partitioning and Regression Trees
43 stars 23 forks source link

what is the Meaning of Surrogate Split? #34

Open A-Pai opened 2 years ago

A-Pai commented 2 years ago

tmp_df <- data.frame( Y = as.factor(c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)), weight = 10:1, height = c(10:7, 5, 6, 4:1) )

tmp_df$weight[3] <- NA tm <- rpart(Y ~ weight + height, data = tmp_df, control = rpart.control( minsplit = 1, minbucket = 1, cp = 0, maxdepth = 1 ) ) summary(tm)

image

1、what is the meaning of “Primary splits” and “Surrogate splits”? 2、Where does the “Surrogate splits height < 4.5” come from? 3、what is the meaning of “to the left”? 4、what is the meaning of “agree” and “adj”?

bgreenwell commented 2 years ago

@A-Pai most of what you’re asking is explained in the associated vignette, which you can find on the CRAN landing page here.

A-Pai commented 2 years ago

@A-Pai most of what you’re asking is explained in the associated vignette, which you can find on the CRAN landing page here.

Thank you for your guidance,but is it there? I can not find it,Or please give me a screenshot

bgreenwell commented 2 years ago

@A-Pai Agreement and adjusted agreement are used in the (default) computation of variable importance in rpart. Take a look at section 3.4.

bethatkinson commented 2 years ago

primary splits look at each of the listed variables and describes how well they do at improving the node purity. surrogate splits are for when there are missing values. The "agree" measure looks at how well the surrogate splits would give the same split as the first primary split that is listed, then uses that surrogate instead of the primary variable for the missing values. So in this example, height < 3.5 is used instead of weight < 5.5 for the 1 missing value. Look at a table with (height < 3.5) vs (weight < 5.5).


From: lg21c @.> Sent: Friday, October 15, 2021 9:59 PM To: bethatkinson/rpart @.> Cc: Subscribed @.***> Subject: [EXTERNAL] [bethatkinson/rpart] what is the Meaning of Surrogate Split? (#34)

tmp_df <- data.frame( Y = as.factor(c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)), weight = 10:1, height = c(10:7, 5, 6, 4:1) )

tmp_df$weight[3] <- NA tm <- rpart(Y ~ weight + height, data = tmp_df, control = rpart.control( minsplit = 1, minbucket = 1, cp = 0, maxdepth = 1 ) ) summary(tm)

[image]https://user-images.githubusercontent.com/5112397/137571069-05345d10-eb24-447a-9c16-6dd0c21c20ec.png

1、what is the meaning of “Primary splits” and “Surrogate splits”? 2、

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/34, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG5YFOKBTMMFEOMED3EDUHDTCNANCNFSM5GDEFMOQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

A-Pai commented 2 years ago

primary splits look at each of the listed variables and describes how well they do at improving the node purity. surrogate splits are for when there are missing values. The "agree" measure looks at how well the surrogate splits would give the same split as the first primary split that is listed, then uses that surrogate instead of the primary variable for the missing values. So in this example, height < 3.5 is used instead of weight < 5.5 for the 1 missing value. Look at a table with (height < 3.5) vs (weight < 5.5). ____ From: lg21c @.> Sent: Friday, October 15, 2021 9:59 PM To: bethatkinson/rpart @.> Cc: Subscribed @.***> Subject: [EXTERNAL] [bethatkinson/rpart] what is the Meaning of Surrogate Split? (#34) tmp_df <- data.frame( Y = as.factor(c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)), weight = 10:1, height = c(10:7, 5, 6, 4:1) ) tmp_df$weight[3] <- NA tm <- rpart(Y ~ weight + height, data = tmp_df, control = rpart.control( minsplit = 1, minbucket = 1, cp = 0, maxdepth = 1 ) ) summary(tm) [image]https://user-images.githubusercontent.com/5112397/137571069-05345d10-eb24-447a-9c16-6dd0c21c20ec.png 1、what is the meaning of “Primary splits” and “Surrogate splits”? 2、 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#34>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG5YFOKBTMMFEOMED3EDUHDTCNANCNFSM5GDEFMOQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

----what is the meaning of “to the left”?

A-Pai commented 2 years ago

the tree growed by the code: image