bethatkinson / rpart

Recursive Partitioning and Regression Trees
43 stars 23 forks source link

Error in prune.rpart: subscript out of bounds #26

Closed kurpav00 closed 2 years ago

kurpav00 commented 3 years ago

Hello,

When trying to build a rpart model with my data, the prune.rpart function sometimes triggers the "subscript out of bounds" error. This problem has been already mentioned here: https://github.com/bethatkinson/rpart/issues/4. I, unlike the author of the previous issue, can provide a reproducible example including the data, see https://pastebin.com/d9jfCnNe. The prune function does not always fail (you have to try several times, that's why the for-loop in the code), but when it does, it says:

Error in `[<-`(`*tmp*`, max(keep), 1L, value = cp) : 
  subscript out of bounds

Thank you in advance for any help.

bquistorff commented 2 years ago

I was able to reproduce this bug. It happens when cptable[,1] contains identical values (at least as big as the pruning cp). In https://github.com/bethatkinson/rpart/blob/39806853bf5b1931e795897d7a8d5cb20b366ca6/R/prune.rpart.R#L9-L12 The match returns only the first index of each unique(temp) in temp. cptable[keep,... then, not only deletes rows with cp values below cp, but some internal rows so that max(keep)>length(keep). I'm submitting at PR https://github.com/bethatkinson/rpart/pull/29 that implements the fix noted in issue #4 and that fixed the instance of the bug mentioned here.