bethatkinson / rpart

Recursive Partitioning and Regression Trees
43 stars 23 forks source link

A couple of questions (issues?) on the vignette `longintro` #46

Open iago-pssjd opened 1 year ago

iago-pssjd commented 1 year ago

On one hand, in the section 4.1, page 13 of the pedf, it is said:

Using the first result, we can uniquely define $T\alpha$ as the smallest tree $T$ for which $R\alpha (T) $ is minimized.

My question is if shouldn't be the biggest tree instead of the smallest, since, as I understand, it is around a sub tree of the full model. Am I wrong?

Secondly, just after that the intervals are printed as:

\begin{eqnarray*}
        I_1 &=& [0, \alpha_1 ]     \\
        I_2 &=& ( \alpha_1 , \alpha_2 ]      \\
         \vdots \\
        I_m &=&  ( \alpha_{m-1} , \infty]
\end{eqnarray*}

However, brackets seem to be reversed. Shouldn't they be as follows?

\begin{eqnarray*}
        I_1 &=& [0, \alpha_1 )     \\
        I_2 &=& [ \alpha_1 , \alpha_2 )      \\
         \vdots \\
        I_m &=&  [ \alpha_{m-1} , \infty)
\end{eqnarray*}

Finally, in section 4.3 it is found that

Looking at the table, we see that the best tree has 10 terminal nodes (9 splits), based on cross-validation.

And then it is claimed that

This sub tree is extracted with a call to prune and saved in fit9.

However the prune fit9 extracted has 10 splits (and 11 terminal nodes, as in Figure 4), as it uses cp = 0.2 > 0.022222, and so, with the notation of the intervals (with my correction[???]), this cp = 0.2 belongs to $I_5 = [0.0166667, 0.0222222)$.

Thanks!