Claim 1.1 HS improves predictive performance of DT (Fig. 4)

The prediction performance results for classification and regression are plotted in Fig 4A and Fig 4B respectively, with the number of leaves, a measure of model complexity, plotted on the x-axis. We consider trees grown using four different techniques: CART, CART with cost-complexity pruning (CCP), C4.5 (Quinlan, 2014), and GOSDT (Lin et al., 2020), a method that grows optimal trees in terms of the cost-complexity penalized misclassification loss. To reduce clutter, we only display the classification results for CART and CART with CCP in Fig 4A/B and defer the results for C4.5 (Fig S3) and GOSDT (Appendix S4.2) to the appendix.

For each of the four tree-growing methods, we grow a tree to a fixed number of leaves m,6 for several different choices of m ∈ {2, 4, 8, 12, 15, 20, 24, 28, 30, 32} (in practice, m would be pre-specified by a user or selected via cross-validation). For each tree, we compute its prediction performance before and after applying HS, where the regularization parameter for HS is selected from the set λ ∈ {0.1, 1.0, 10.0, 25.0, 50.0, 100.0} via cross-validation. Results for each experiment are averaged over 10 random data splits. We observe that HS (solid lines in Fig 4A,B) does not hurt prediction in any of our data sets, and often leads to substantial performance gains. For example, taking m = 15, we observe an average increase in relative predictive performance (measured by AUC) of 6.2%, 6.5%, 8% for HS applied to CART, CART with CCP, and C4.5 respectively for the classification data sets. For the regression data sets with m = 15, we observe an average relative increase in R2 performance of 9.8% and 10.1% for CART and CART with CCP respectively.

do8572 / MLDS

Claim 1.1 HS improves predictive performance of DT (Fig. 4) #1