I have the impression that the objective function implemented in the code might be incorrect.
The returned value of objective_function is var - (left_val + right_val), which is the reduction in variance. According to the paper, the split to be chosen has the LARGEST reduction in variance.
Therefore, in train_recurse it should be (objective >= maximum_objective) instead of (objective < minimum_objective) I think, with maximum_objective initialized to 0.
Otherwise, you are rewarding nodes that divide the parent set into one empty set and a child set with the same elements as the parent set.
I have the impression that the objective function implemented in the code might be incorrect.
The returned value of objective_function is var - (left_val + right_val), which is the reduction in variance. According to the paper, the split to be chosen has the LARGEST reduction in variance.
Therefore, in train_recurse it should be (objective >= maximum_objective) instead of (objective < minimum_objective) I think, with maximum_objective initialized to 0.
Otherwise, you are rewarding nodes that divide the parent set into one empty set and a child set with the same elements as the parent set.