Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
149 stars 50 forks source link

Refactor Graph to avoid cross-platform issues #100

Closed mjpieters closed 4 years ago

mjpieters commented 4 years ago

This uses

This should fix #97.

Caveat: I haven't actually run this code, it's pure dead reconning and linters. :-)

mjpieters commented 4 years ago

I attempted to get CircleCI running under my own account, but failed. I'm not going to spend any further time on that.

mjpieters commented 4 years ago

For what it's worth, I can run the tests locally and they pass. That's not that revelatory, because the graph module has no test coverage and I didn't touch anything else.

However, I constructed a simple manual test file, based on one of the examples in the documentation, which renders the same graph on master and with this pull request:

import pandas
from CHAID import Tree

tree = Tree.from_pandas_df(
    pandas.read_csv("tests/data/titanic.csv"),
    dict.fromkeys({"sex", "embarked"}, "nominal"),
    "survived",
    max_depth=4,
    alpha_merge=0.05,
    min_parent_node_size=2,
)

tree.render("rendered.dot")

This is essentially the same thing as python -m CHAID tests/data/titanic.csv survived sex embarked --max-depth 4 --min-parent-node-size 2 --alpha-merge 0.05 --export-path rendered.dot but a little easier to tweak and play with.

I note that this reveals a separate problem, as dot issues this warning (both with and without my changes):

Warning: Orthogonal edges do not currently handle edge labels. Try using xlabels.

That's an easy fix so I'll just include that in this pull request next: replacing label with xlabel in the g.edge() call.

mjpieters commented 4 years ago

With that last fix, my little test script outputs:

graph test

Ranji321 commented 4 years ago

Thank you so much for the code. I run the code and i m getting the error as TypeError: sequence item 0: expected str instance, int found in edge_label = " ({}) \n ".format(', '.join(node.choices)). I could not able to find the solution, please help me out in fixing this...

The code I used as follows below

    from CHAID import Tree
    import pandas as pd 
    import numpy as np
    import os 

    df=pd.read_csv('C:\\Users\\ps\\chaid_pro.csv')

 independent_variable_columns = ['gender', 'grade', 'no_renewals', 'complaint_count']
dep_variable = 'switch'
tree = Tree.from_pandas_df(
    df,
    dict(zip(independent_variable_columns, ['nominal'] * 38)),
    dep_variable,
    max_depth=2
)

    import os
    from datetime import datetime
    import plotly.graph_objs as go
    import plotly.io as pio
    import colorlover as cl
    from graphviz import Digraph
    import tempfile
    try:
        # Python 3.2 and newer
        from tempfile import TemporaryDirectory
    except ImportError:
        # minimal backport of TemporaryDirectory for Python 2.7, sufficient
        # for use with this module.
        import shutil
        from tempfile import mkdtemp
        class TemporaryDirectory(object):
            def __init__(self):
                self.name = mkdtemp()
            def __enter__(self):
                return self.name
            def __exit__(self, *args):
                shutil.rmtree(self.name, ignore_errors=True)
    FIG_BASE = {
        "layout": {
             "margin_t": 50,
            "annotations": [{"font_size": 18, "x": 0.5, "y": 0.5}, {"y": [0, 0.2]}],
        },
    }
    FIG_BASE_DATA = {
        "domain": {"x": [0, 1], "y": [0.4, 1.0]},
        "hole": 0.4,
        "type": "pie",
        "marker_colors": cl.scales["5"]["qual"]["Set1"],
    }
    TABLE_HEADER = ["<i>p</i>", "score", "splitting on"]
    TABLE_CONFIG = {
        "domain": {"x": [0.3, 0.7], "y": [0, 0.37]},
        "header": {"fill_color": "#FFF"},
    }
    TABLE_CELLS_CONFIG = {
        "line_color": "#FFF",
        "align": "left",
        "font_color": "#282828",
        "height": 27,
        "fill_color": ["#EBC1EE", "#EDEAFB"],
    }

    class Graph(object):
        def __init__(self, tree):
            self.tree = tree

        def render(self, path, view):
            if path is None:
                path = os.path.join("trees", "{:%Y-%m-%d %H:%M:%S}.gv".format(datetime.now()))
            with TemporaryDirectory() as self.tempdir:
                g = Digraph(
                    format="png",
                    graph_attr={"splines": "ortho"},
                    node_attr={"shape": "plaintext", "labelloc": "b"},
                )      
                for node in self.tree:
                    image = self.bar_chart(node)
                    g.node(str(node.node_id), image=image)
                    if node.parent is not None:
                        edge_label = "   ({})   \n ".format(', '.join(node.choices))
                        g.edge(str(node.parent), str(node.node_id), xlabel=edge_label)

                g.render(path, view=view)

        def bar_chart(self, node):
            fig = dict(
                data=[
                    dict(
                        values=list(node.members.values()),
                        labels=list(node.members),
                        showlegend=(node.node_id == 0),
                        **FIG_BASE_DATA
                        )
                    ],

                **FIG_BASE
                 )

            if not node.is_terminal:
                    fig["data"].append(self._table(node))

            filename = os.path.join(self.tempdir, "node-{}.png".format(node.node_id))
            pio.write_image(fig, file=filename, format="png")
            return filename

        def _table(self, node):
            p = None if node.p is None else format(node.p, ".5f")
            score = None if node.score is None else format(node.score, ".2f")
            values = [p, score, node.split.column]
            return go.Table(
                   cells=dict(values=[TABLE_HEADER, values], **TABLE_CELLS_CONFIG),
                **TABLE_CONFIG
            )
tree.render("rendered.dot")
Ranji321 commented 4 years ago

Thank you so much for the code. I run the code and now i m getting the error as Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-0.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-0.png" for node "0" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-1.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-1.png" for node "1" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-2.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-2.png" for node "2" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-3.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-3.png" for node "3" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-4.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-4.png" for node "4" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-5.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-5.png" for node "5" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-6.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-6.png" for node "6" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-7.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-7.png" for node "7" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-8.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-8.png" for node "8" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-9.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-9.png" for node "9" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-10.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-10.png" for node "10" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-11.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-11.png" for node "11" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-12.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-12.png" for node "12" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-13.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-13.png" for node "13" Warning: No such file or directory while opening C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-14.png Warning: No or improper image="C:\Users\RANJIT~1.PS\AppData\Local\Temp\tmpv70rowwm\node-14.png" for node "14" However I could able to get the dot file by using the following command

treeG=tree.to_tree()
treeG.to_graphviz()

but how do I get the nice chart as you showed above(with proportions of "0" and "1". is there any way I can pass this dot file and get that tree chart. It would be great help to me. Attached the dot file for your reference digraph tree.docx

mjpieters commented 4 years ago

Right, so the independent variable values can be anything, not just strings. I’ll fix this later, but the quick fix is to add conversion to strings for the edge labels. Replace ', '.join(node.choices) with ', '.join(map(str, node.choices))

Ranji321 commented 4 years ago

Right, so the independent variable values can be anything, not just strings. I’ll fix this later, but the quick fix is to add conversion to strings for the edge labels. Replace ', '.join(node.choices) with ', '.join(map(str, node.choices))

Yes I fixed that but now I m getting same old error as Warning: No such file or directory while opening C:\Users\RANJIT1.PS\AppData\Local\Temp\tmpv70rowwm\node-0.png Warning: No or improper image="C:\Users\RANJIT1.PS\AppData\Local\Temp\tmpv70rowwm\node-0.png" for node "0"

However I could able to get the dot file by using the following command

treeG=tree.to_tree() treeG.to_graphviz()

but how do I get the nice chart as you showed above(with proportions of "0" and "1". is there any way I can pass this dot file and get that tree chart. It would be great help to me. Attached the dot file for your reference digraph tree.docx

mjpieters commented 4 years ago

That's a GraphViz graph created from the treelib tree, and contains just a subset of the information in CHAID. Your CHAID data has been turned into strings in the labels (([], {0: 2142076.0, 1: 68348.0}, (no_renewal, p=0.0, score=14911.92458184376, groups=[[0], [1]]), dof=1)) for the root, and ([0], {0: 2078360.0, 1: 60578.0}, (compliant_count, p=0.0, score=6385.601378408456, groups=[[0], [1]]), dof=1)) for the first child node, etc.) and is just not very interesting or useful.

You can use the dot command from Graphviz to render that into an SVG or PNG but it won't be nearly as informative as the one that CHAID generates. I'd not bother with it.

Ranji321 commented 4 years ago

That's a GraphViz graph created from the treelib tree, and contains just a subset of the information in CHAID. Your CHAID data has been turned into strings in the labels (([], {0: 2142076.0, 1: 68348.0}, (no_renewal, p=0.0, score=14911.92458184376, groups=[[0], [1]]), dof=1)) for the root, and ([0], {0: 2078360.0, 1: 60578.0}, (compliant_count, p=0.0, score=6385.601378408456, groups=[[0], [1]]), dof=1)) for the first child node, etc.) and is just not very interesting or useful.

You can use the dot command from Graphviz to render that into an SVG or PNG but it won't be nearly as informative as the one that CHAID generates. I'd not bother with it.

Yes sir. Your absolutely true I am unable to get the tree chart as you showed above(with proportions and nice representation of node) by passing the dot file with Graphivz. I dont know where I went wrong and unable to get the output as showed by you above though you sent full code to execute the same. If possible kindly see the code I pasted above so that I can still try to get the output. Sorry to trouble you so much.

Ranji321 commented 4 years ago

That's a GraphViz graph created from the treelib tree, and contains just a subset of the information in CHAID. Your CHAID data has been turned into strings in the labels (([], {0: 2142076.0, 1: 68348.0}, (no_renewal, p=0.0, score=14911.92458184376, groups=[[0], [1]]), dof=1)) for the root, and ([0], {0: 2078360.0, 1: 60578.0}, (compliant_count, p=0.0, score=6385.601378408456, groups=[[0], [1]]), dof=1)) for the first child node, etc.) and is just not very interesting or useful. You can use the dot command from Graphviz to render that into an SVG or PNG but it won't be nearly as informative as the one that CHAID generates. I'd not bother with it.

Yes sir. Your absolutely true I am unable to get the tree chart as you showed above(with proportions and nice representation of node) by passing the dot file with Graphivz. I dont know where I went wrong and unable to get the output as showed by you above though you sent full code to execute the same. If possible kindly see the code I pasted above so that I can still try to get the output. Sorry to trouble you so much.

Tried so much but still the same error as Warning: No such file or directory while opening C:\Users\AppData\Local\Temp\tmpibr_jqhz\node-0.png Warning: No or improper image="C:\Users\AppData\Local\Temp\tmpibr_jqhz\node-0.png" for node "0"

and getting tree chart as attached below rendered dot

Ranji321 commented 4 years ago

That's a GraphViz graph created from the treelib tree, and contains just a subset of the information in CHAID. Your CHAID data has been turned into strings in the labels (([], {0: 2142076.0, 1: 68348.0}, (no_renewal, p=0.0, score=14911.92458184376, groups=[[0], [1]]), dof=1)) for the root, and ([0], {0: 2078360.0, 1: 60578.0}, (compliant_count, p=0.0, score=6385.601378408456, groups=[[0], [1]]), dof=1)) for the first child node, etc.) and is just not very interesting or useful. You can use the dot command from Graphviz to render that into an SVG or PNG but it won't be nearly as informative as the one that CHAID generates. I'd not bother with it.

Yes sir. Your absolutely true I am unable to get the tree chart as you showed above(with proportions and nice representation of node) by passing the dot file with Graphivz. I dont know where I went wrong and unable to get the output as showed by you above though you sent full code to execute the same. If possible kindly see the code I pasted above so that I can still try to get the output. Sorry to trouble you so much.

Tried so much but still the same error as Warning: No such file or directory while opening C:\Users\AppData\Local\Temp\tmpibr_jqhz\node-0.png Warning: No or improper image="C:\Users\AppData\Local\Temp\tmpibr_jqhz\node-0.png" for node "0"

and getting tree chart as attached below rendered dot

Is there anyway we can execute the code by changing the temp directory I mean instead of temp dir we can use other dir for saving all related files in one place and run, will that solve my issue here ...?

mjpieters commented 4 years ago

Sorry, this is getting a little too hard for me to debug and troubleshoot. You can try and install the project as a whole, directly from GitHub:

pip install git+https://github.com/mjpieters/CHAID@patch-1#egg=CHAID

That installs the version of the library found in this pull request.

Ranji321 commented 4 years ago

Sorry, this is getting a little too hard for me to debug and troubleshoot. You can try and install the project as a whole, directly from GitHub:

pip install git+https://github.com/mjpieters/CHAID@patch-1#egg=CHAID

That installs the version of the library found in this pull request.

Thank you for your response.

Rambatino commented 4 years ago

Thank-you very much for this.

Have tested the output locally and as you said it's devoid of any spec coverage anyway so it would be very hard for Circle CI to fail.

I've switched the Circle CI flag on to permit forked PRs to run CI tests so this shouldn't be an issue anymore.

Again, thanks.

mjpieters commented 4 years ago

Have tested the output locally and as you said it's devoid of any spec coverage anyway so it would be very hard for Circle CI to fail.

A test that checks that a PNG was generated would already go a long way towards detecting regressions across platforms. :-) Many CI platforms let you record files as assets to be kept for later inspection, so you can then at least do a manual spot check as well. Perhaps something to consider! :-)