cerlymarco / linear-tree

A python library to build Model Trees with Linear Models at the leaves.
MIT License
338 stars 54 forks source link

Which traversing method does linear tree use to find the left and right node ? #19

Closed akhilkapil closed 2 years ago

akhilkapil commented 2 years ago

Hi all, I am having a hard time finding out which method is used by linear tree to traverse the whole linear tree. Cause sometimes when I am plotting the tree plot and comparing it with the summary, the mapping makes no sense. For some left node the plot is displaying it as right and vice-versa. you guys can compare the summary with the plot and let me know if I am incorrect somewhere. output

0: {'col': 1, 'th': 0.0127, 'loss': 0.1937, 'samples': 160, 'children': (1, 2), 'models': (RidgeClassifier(), RidgeClassifier())}, 1: {'col': 6, 'th': 0.1461, 'loss': 0.1, 'samples': 80, 'children': (3, 4), 'models': (RidgeClassifier(), RidgeClassifier())}, 2: {'col': 0, 'th': 2.6051, 'loss': 0.05, 'samples': 80, 'children': (9, 10), 'models': (RidgeClassifier(), RidgeClassifier())}, 4: {'col': 0, 'th': -0.0708, 'loss': 0.0364, 'samples': 55, 'children': (5, 6), 'models': (RidgeClassifier(), RidgeClassifier())}, 6: {'col': 2, 'th': -0.7986, 'loss': 0.0, 'samples': 32, 'children': (7, 8), 'models': (RidgeClassifier(), RidgeClassifier())}, 9: {'col': 2, 'th': -0.0865, 'loss': 0.0, 'samples': 59, 'children': (11, 12), 'models': (RidgeClassifier(), RidgeClassifier())}, 3: {'loss': 0.08, 'samples': 25, 'models': RidgeClassifier(), 'classes': array([0, 1, 2])}, 5: {'loss': 0.0, 'samples': 23, 'models': RidgeClassifier(), 'classes': array([0, 1, 2])}, 7: {'loss': 0.0, 'samples': 16, 'models': RidgeClassifier(), 'classes': array([0, 1])}, 8: {'loss': 0.0, 'samples': 16, 'models': RidgeClassifier(), 'classes': array([0, 1, 2])}, 11: {'loss': 0.0, 'samples': 32, 'models': RidgeClassifier(), 'classes': array([0, 1, 2])}, 12: {'loss': 0.0, 'samples': 27, 'models': RidgeClassifier(), 'classes': array([0, 1, 2])}, 10: {'loss': 0.0476, 'samples': 21, 'models': RidgeClassifier(), 'classes': array([0, 1, 2])}}

cerlymarco commented 2 years ago

Hi, thanks for your feedback but the plot you reported seems mapped one to one with the summary below.

_Starting from the root, the id_node_0 is evaluated and id_node_1 and id_node_2 are created. Then id_node_1 is evaluated and id_node_3 and id_node_4 are created. Then id_node_3 is evaluated but we find no utility in splitting. Then id_node_4 is evaluated and id_node_5 and id_node6 are created....

What I reported is the order used when evaluating the splits. If u mean that id_node_3 and id_node_4 should be changed in the plot (according to the order they are evaluated during fitting), I can do nothing for this since it's pydot that creates the plot. It displays the circles/squares of each node according to the place at disposal.

If you support the project don't forget to leave a star ;-)