NiklasPfister / adaXT

adaXT: tree-based machine learning in Python
https://niklaspfister.github.io/adaXT/
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Fine tunning is_leaf conditions #30

Closed WilliamHeuser closed 9 months ago

WilliamHeuser commented 10 months ago

The conditions in DepthTreeBuilder for whether or not a node is a leaf can be improved. The min_improvement condition works currently but if we ever have a criteria function that does not have a weighted form this calculation will be incorrect. More information can be found on this comment of the PR: https://github.com/NiklasPfister/adaXT/pull/23#discussion_r1416858721

Furthermore the impurity_tol condition is checked before the split is done, which results in leaf nodes with a lower impurity than the tolerance. Thus the impurity_tol is NOT a lower bound on the training sample error. More information can be found here: https://github.com/NiklasPfister/adaXT/pull/23#discussion_r1416856153

NiklasPfister commented 10 months ago

I am also getting the following error message if I set the min_samples_leaf too small. Could you investigate?

    192 # Stopping Conditions - AFTER:
    193 # boolean used to determine wheter 'parent node' is a leaf or not
    194 # additional stopping criteria can be added with 'or'
    195 # statements
--> 196 N_t_L = len(split[0])
    197 N_t_R = len(split[1])
    198 is_leaf = (n_samples /
    199            n_obs *
    200            (impurity -
   (...)
    206             child_imp[1]) < min_improvement +
    207            EPSILON or N_t_L < min_samples_leaf or N_t_R < min_samples_leaf or is_leaf)

IndexError: list index out of range
NiklasPfister commented 10 months ago

@svbrodersen fix with checking for len(split) == 0 solves this

FIXED now