bethatkinson / rpart

Recursive Partitioning and Regression Trees
43 stars 23 forks source link

Does rpart() alone prune the tree? #35

Closed MathieuMarauri closed 2 years ago

MathieuMarauri commented 2 years ago

Hello,

I have a question regarding the pruning in rpart() and more specifically the cp parameter. When building a tree as explained in the CART method, first a maximal tree is built then it is pruned using a sequence of subtrees. The pruning is done using a complexity parameter.

IN rpart(), when setting cp = 0 the maximum tree is built and no pruning seems to be done. The cptable shows the values for cp and the associated xerror. One can use these values to find the optimum cp (finding the smallest tree having an xerror less than the smallest one + 1 SE) and then use the prune() function to perform the pruning step.

My question is then, does the rpart() function alone prune the tree or should we use prune() on a tree built with cp = 0 with a proper cp value? My understanding is that the parameter cp is used as a stopping criteria in rpart() and that no pruning is done if one does not explicitly use the prune() function. Am I missing something?

All the best, Mathieu

bethatkinson commented 2 years ago

The trees are built to a CP level = 0.01, which generally is still over-fit. I would still look at the cptable and use prune() to further trim the tree.


From: maRmat @.> Sent: Sunday, November 7, 2021 1:13 PM To: bethatkinson/rpart @.> Cc: Subscribed @.***> Subject: [EXTERNAL] [bethatkinson/rpart] Does rpart() alone prune the tree? (Issue #35)

Hello,

I have a question regarding the pruning in rpart() and more specifically the cp parameter. When building a tree as explained in the CART method, first a maximal tree is built then it is pruned using a sequence of subtrees. The pruning is done using a complexity parameter.

IN rpart(), when setting cp = 0 the maximum tree is built and no pruning seems to be done. The cptable shows the values for cp and the associated xerror. One can use these values to find the optimum cp (finding the smallest tree having an xerror less than the smallest one + 1 SE) and then use the prune() function to perform the pruning step.

My question is then, does the rpart() function alone prune the tree or should we use prune() on a tree built with cp = 0 with a proper cp value? My understanding is that the parameter cp is used as a stopping criteria in rpart() and that no pruning is done if one does not explicitly use the prune() function. Am I missing something?

All the best, Mathieu

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG5YWKNK5SDW7TGMMGUTUK3FUNANCNFSM5HREIQXQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

MathieuMarauri commented 2 years ago

Thank you for the quick reply.

So it is correct to say that cp is only used as a stopping criteria in rpart()? One should either select a better value by cross-validation or should use prune() to build a better tree?

Thank you for your help :)

bethatkinson commented 2 years ago

I'd look at the cross-validated error to inform your selection of CP, then prune the tree accordingly.


From: maRmat @.> Sent: Monday, November 8, 2021 3:30 AM To: bethatkinson/rpart @.> Cc: Atkinson, Elizabeth J. (Beth), M.S. @.>; Comment @.> Subject: [EXTERNAL] Re: [bethatkinson/rpart] Does rpart() alone prune the tree? (Issue #35)

Thank you for the quick reply.

So it is correct to say that cp is only used as a stopping criteria in rpart()? One should either select a better value by cross-validation or should use prune() to build a better tree?

Thank you for your help :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/35#issuecomment-962966338, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG577LZFKJWZXWDBU52TUK6KDHANCNFSM5HREIQXQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

MathieuMarauri commented 2 years ago

Thank you very much.