bethatkinson / rpart

Recursive Partitioning and Regression Trees
43 stars 23 forks source link

Are `weights` used during cross-validation? #40

Open vgherard opened 2 years ago

vgherard commented 2 years ago

Hello there! Thanks for this very nice package.

I know similar questions have been asked multiple times, but I was not able to get a 100% clear answers from what I read over the net, so please bear with me.

I am unsure whether the case weights (the argument in rpart()) applied for growing the tree, are also applied in the cross-validation step performed by the same function. By glimpsing through the code, I'm positive the answer is yes, but I would kindly ask for your confirmation.

On a related note: is the loss matrix (rpart.control(loss=)) used in any way for growing the tree? Here my understanding is that no, it is just used for computing the threshold probability of positives, and the validation error, but again I would love a confirmation.

Thanks in advance, and sorry if this sounds extremely redundant to you.

Valerio

bethatkinson commented 2 years ago

yes, case weights would also be applied to the cross-validation step yes, the loss argument matters

fit <- rpart(pgstat ~ age + eet + g2 + grade + gleason + ploidy, data=stagec, method='class')

fit2 <- rpart(pgstat ~ age + eet + g2 + grade + gleason + ploidy, data=stagec, method='class', parms=list(loss=matrix(c(0,.8,.2,0),nrow=2)))

par(mfrow=c(1,2)) plot(fit) plot(fit2)


From: Valerio Gherardi @.> Sent: Friday, January 28, 2022 4:23 AM To: bethatkinson/rpart @.> Cc: Subscribed @.***> Subject: [EXTERNAL] [bethatkinson/rpart] Are weights used during cross-validation? (Issue #40)

Hello there! Thanks for this very nice package.

I know similar questions have been asked multiple times, but I was not able to get a 100% clear answers from what I read over the net, so please bear with me.

I am unsure whether the case weights (the argument in rpart()) applied for growing the tree, are also applied in the cross-validation step performed by the same function. By glimpsing through the code, I'm positive the answer is yes, but I would kindly ask for your confirmation.

On a related note: is the loss matrix (rpart.control(loss=)) used in any way for growing the tree? Here my understanding is that no, it is just used for computing the threshold probability of positives, and the validation error, but again I would love a confirmation.

Thanks in advance, and sorry if this sounds extremely redundant to you.

Valerio

— Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/40, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG5YD55M4OEEQEELEXGTUYJVCDANCNFSM5NALPBTA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.***>

vgherard commented 2 years ago

Apologies for the late response, thank you very much! Feel free to close this one.