csinva / imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
https://csinva.io/imodels
MIT License
1.35k stars 120 forks source link

Fixed FIGS plotting #134

Closed mepland closed 1 year ago

mepland commented 1 year ago

Fixed Issue 132, FIGS plots not appearing correctly.

The primary bug was in the assignment of node ids here.

            right = next(node_counter)
            left = next(node_counter)

They were being improperly set during the recursion of _update_node(nd). I've fixed this by assigning a new node_num variable after the trees are created during fit() here and using that instead:

        # add node_num to final tree
        for tree_ in self.trees_:
            node_counter = iter(range(0, int(1e06)))
            def _add_node_num(node: Node):
                if node is None:
                    return
                node.setattrs(node_num=next(node_counter))
                _add_node_num(node.left)
                _add_node_num(node.right)

            _add_node_num(tree_)

I also took the opportunity to return a real sklearn DecisionTreeClassifier or DecisionTreeRegressor object, filling the parameters, including tree_, with the __setstate__() method, building on this SO question. In order to do this, I needed the impurity at each node and the "value" as expected by sklearn, i.e. value = np.array([neg_count, pos_count], dtype=float). If we further rewrite the FIGS class to save this 2D "value" along side the current value, perhaps as value_sklearn, I wouldn't need X_train, y_train for the extract_sklearn_tree_from_figs function, and the subsequent plotting functions.

@csinva does my implementation of the impurity variable look correct? I see the impurities are recomputed after I grab my impurity values, so I expect not. Perhaps you could fix this, or let me know the best way to get the final impurity at each node? I'll also wait for the go ahead on adding the value_sklearn variable, and refactoring away the dependence on X_train, y_train in the plotting functions.

mepland commented 1 year ago

You can see the correct FIGS plots in my notebook here. We can remove the notebook from the PR when ready.

Glucose concentration test <= 99.500 65/192 (33.85%)
    ΔRisk = 0.07 4/59 (6.78%)
    Glucose concentration test <= 168.500 61/133 (45.86%)
        #Pregnant <= 6.500 44/112 (39.29%)
            Body mass index <= 30.850 21/76 (27.63%)
                ΔRisk = 0.06 2/31 (6.45%)
                Blood pressure(mmHg) <= 67.000 19/45 (42.22%)
                    ΔRisk = 0.71 10/14 (71.43%)
                    ΔRisk = 0.30 9/31 (29.03%)
            ΔRisk = 0.64 23/36 (63.89%)
        Blood pressure(mmHg) <= 93.000 17/21 (80.95%)
            ΔRisk = 0.86 17/19 (89.47%)
            ΔRisk = -0.01 0/2 (0.0%)

figs

csinva commented 1 year ago

Yes this looks great! Thank you for this lovely fix + nice modifications 🤗 .

Your impurity calculation is correct (at least for the object sklearn is expecting). Our recalculation of the impurity reduction is for downstream use by the FIGS algorithm when considering opening this node but the impurity at this node is exactly as you have it here.

mepland commented 1 year ago

Great! I will leave the impurity calculation alone then, and have moved the values_sklearn creation into fit().

I tried saving the pos and neg counts while creating each node as calculated from y[idxs], y[idxs_left], y[idxs_right] as appropriate, but was getting inconsistent results. Instead I just added values_sklearn after the final tree has been grown by FIGS, at the same time that I add the node_id values.

I also added an optional fig_size parameter to plot() to control the size of the figure, and thus the spacing between nodes when plotted.

mepland commented 1 year ago

dtreeviz is now working for FIGS! For more plots see my demo notebook here. This closes Issue 135 for FIGS trees.

from dtreeviz import trees
from dtreeviz.models.sklearn_decision_trees import ShadowSKDTree
from imodels.tree.viz_utils import extract_sklearn_tree_from_figs

dt = extract_sklearn_tree_from_figs(model_figs, tree_num=0, n_classes=2)
sk_dtree = ShadowSKDTree(dt, X_train, y_train, feat_names, 'y', [0, 1])

figs

csinva commented 1 year ago

Wonderful!

Manuelhrokr commented 1 year ago

Importing imodels.tree.viz_utils import extract_sklearn_tree_from_figs as per indicated in the demo notebook gives the following error:

ImportError: cannot import name 'extract_sklearn_tree_from_figs' from 'imodels.tree.viz_utils' (/home/manuelzl/miniconda3/envs/manuelenv_python/lib/python3.7/site-packages/imodels/tree/viz_utils.py)

I checked viz_utils.py file in the path, and the code there is different than the one shown in EXPAND SOURCE CODE.

Method in my current version of viz_utils.py is: extract_sklearn_tree_from_figs_tree((node, n_classes)

csinva commented 1 year ago

Hi @Manuelhrokr sorry about this and for the delay! Do you know what version of imodels you are using? I think your issue should be fixed if you bump to the new version: pip install --upgrade imodels.

Manuelhrokr commented 1 year ago

Hi @csinva, sorry for my late response. Thank for the feedback, I actually updated as you suggested and it fixed the problem.