bethatkinson / rpart

Recursive Partitioning and Regression Trees
50 stars 26 forks source link

Use Case Weights To Threshold Splits #21

Open mhermher opened 4 years ago

mhermher commented 4 years ago

If the weights passed into the model are case weights, then should they not be used to determine whether a split should happen or not?

In partition.c me->num_obs is being compared to rp.min_split instead of me->sum_wt.

similarly, in anova.c (haven't looked at the others), right_n and left_n are being compared to edge (rp.min_node) instead of right_wt and left_wt.

Using case weights to represent number of cases is really helpful in managing runtime and memory efficiency, but the split logic in the C code is not considering them.

Even writing as custom split function would solve the latter case, but not the former.