Closed MathieuMarauri closed 2 years ago
The trees are built to a CP level = 0.01, which generally is still over-fit. I would still look at the cptable and use prune() to further trim the tree.
From: maRmat @.> Sent: Sunday, November 7, 2021 1:13 PM To: bethatkinson/rpart @.> Cc: Subscribed @.***> Subject: [EXTERNAL] [bethatkinson/rpart] Does rpart() alone prune the tree? (Issue #35)
Hello,
I have a question regarding the pruning in rpart() and more specifically the cp parameter. When building a tree as explained in the CART method, first a maximal tree is built then it is pruned using a sequence of subtrees. The pruning is done using a complexity parameter.
IN rpart(), when setting cp = 0 the maximum tree is built and no pruning seems to be done. The cptable shows the values for cp and the associated xerror. One can use these values to find the optimum cp (finding the smallest tree having an xerror less than the smallest one + 1 SE) and then use the prune() function to perform the pruning step.
My question is then, does the rpart() function alone prune the tree or should we use prune() on a tree built with cp = 0 with a proper cp value? My understanding is that the parameter cp is used as a stopping criteria in rpart() and that no pruning is done if one does not explicitly use the prune() function. Am I missing something?
All the best, Mathieu
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG5YWKNK5SDW7TGMMGUTUK3FUNANCNFSM5HREIQXQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thank you for the quick reply.
So it is correct to say that cp is only used as a stopping criteria in rpart()
? One should either select a better value by cross-validation or should use prune()
to build a better tree?
Thank you for your help :)
I'd look at the cross-validated error to inform your selection of CP, then prune the tree accordingly.
From: maRmat @.> Sent: Monday, November 8, 2021 3:30 AM To: bethatkinson/rpart @.> Cc: Atkinson, Elizabeth J. (Beth), M.S. @.>; Comment @.> Subject: [EXTERNAL] Re: [bethatkinson/rpart] Does rpart() alone prune the tree? (Issue #35)
Thank you for the quick reply.
So it is correct to say that cp is only used as a stopping criteria in rpart()? One should either select a better value by cross-validation or should use prune() to build a better tree?
Thank you for your help :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/35#issuecomment-962966338, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG577LZFKJWZXWDBU52TUK6KDHANCNFSM5HREIQXQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thank you very much.
Hello,
I have a question regarding the pruning in
rpart()
and more specifically thecp
parameter. When building a tree as explained in the CART method, first a maximal tree is built then it is pruned using a sequence of subtrees. The pruning is done using a complexity parameter.IN
rpart()
, when settingcp = 0
the maximum tree is built and no pruning seems to be done. The cptable shows the values forcp
and the associatedxerror
. One can use these values to find the optimumcp
(finding the smallest tree having anxerror
less than the smallest one + 1 SE) and then use theprune()
function to perform the pruning step.My question is then, does the
rpart()
function alone prune the tree or should we useprune()
on a tree built withcp = 0
with a proper cp value? My understanding is that the parametercp
is used as a stopping criteria inrpart()
and that no pruning is done if one does not explicitly use theprune()
function. Am I missing something?All the best, Mathieu