NiklasPfister / adaXT

adaXT: tree-based machine learning in Python
https://niklaspfister.github.io/adaXT/
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Look into proxy improvement of squared error #47

Closed svbrodersen closed 6 months ago

svbrodersen commented 8 months ago

The Squared error seems to be performing quite a bit slower than that of sklearn. After looking into it a bit it seems they never calculate the actual squared error when a previous calculation has been done. Afterwards, they simply calculate:

" The MSE proxy is derived from

        sum_{i left}(y_i - y_pred_L)^2 + sum_{i right}(y_i - y_pred_R)^2
        = sum(y_i^2) - n_L * mean_{i left}(y_i)^2 - n_R * mean_{i right}(y_i)^2

    Neglecting constant terms, this gives:

        - 1/n_L * sum_{i left}(y_i)^2 - 1/n_R * sum_{i right}(y_i)^2

"

The split that maximizes this also maximises the impurity in the squared error.