Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
150 stars 50 forks source link

Failed to execute 'getPointAtLength' on 'SVGGeometryElement': The element's path is empty. #132

Closed tomascortes closed 1 year ago

tomascortes commented 1 year ago

Hi!

Awesome library! Im tryning to use the example but im getting this error ValueError: Transform failed with error code 525: Failed to execute 'getPointAtLength' on 'SVGGeometryElement': The element's path is empty.

Looks like is from the library but i dont know if i installed something wrong or im not introducing the parameters correctly

I installed orca and graphviz and im on Macos Ventura 13

ndarr = np.array(([1, 2, 3] * 5) + ([2, 2, 3] * 5)).reshape(10, 3)
df = pd.DataFrame(ndarr)
df.columns = ['a', 'b', 'c']
df['d'] = np.random.normal(300, 100, 10)
independent_variable_columns = ['a', 'b', 'c']
dep_variable = 'd'
tree = Tree.from_pandas_df(df, dict(zip(independent_variable_columns, ['nominal'] * 3)), dep_variable, dep_variable_type='continuous')
tree.print_tree()
tree.render(path="./", view=False)

it also says that is failing at line 96 in graph.py -> pio.write_image(fig, file=filename, format="png")

I would appreciate a lot any help

thanks!

Rambatino commented 1 year ago

Eugh I can't even install orca as I'm on M1 so that's a bit of a nightmare....

Does it not give a reason why it's failing? It could be due to file permissions maybe? Or maybe give it a fully qualified name like ./mypng.png.

Did you also pass in None like in the example in the Readme?

Also, your tree is also a bit lacking in nodes, can you try using one of the trees in the example that gives a nice output?

tomascortes commented 1 year ago

It worked for me in M1 with the Method 4 form orca guide.

Yes I tried with None and with ./mypng.png

Now i tested it with titanic

 df = pd.read_csv("./chaid_titanic.csv")
independent_variable_columns = ['sex', 'age', 'cabin']
dep_variable = 'survived'

tree = Tree.from_pandas_df(df, dict(zip(independent_variable_columns, ['nominal'] * 3)), dep_variable)
tree.print_tree()

which returns

[], {0: 809.0, 1: 500.0}, (sex, p=1.4714531016922664e-81, score=365.8869478111205, groups=[['female'], ['male']]), dof=1))
|-- (['female'], {0: 127.0, 1: 339.0}, (cabin, p=2.059196974581312e-16, score=67.54512571990448, groups=[['<missing>', 'G6', 'C55 C57', 'C22 C26', 'A29', 'E77', 'C49'], ['B35', 'C31', 'D20', 'F4', 'E8', 'B61', 'B45', 'C62 C64', 'E36', 'B42', 'F E69', 'D33', 'A16', 'B22', 'B49', 'C54', 'C85', 'C50', 'D11', 'D15', 'C130', 'E39 E41', 'C83', 'E101', 'B18', 'D37', 'C92', 'D30', 'E68', 'C105', 'D28', 'C99', 'B3', 'E34', 'C2', 'C93', 'D35', 'B80', 'C46', 'C90', 'F33', 'C23 C25 C27', 'B26', 'B4', 'B28', 'B39', 'E44', 'C32', 'B20', 'B41', 'E40', 'A34', 'B71', 'D', 'C126', 'B78', 'C97', 'D17', 'B73', 'D10 D12', 'B77', 'E50', 'D47', 'C7', 'E45', 'B5', 'B36', 'B96 B98', 'C86', 'B51 B53 B55', 'C80', 'B57 B59 B63 B66', 'B58 B60', 'C123', 'D19', 'E31', 'C89', 'C125', 'B69', 'E121', 'C78', 'E67', 'E33', 'C103', 'D21', 'C28', 'D9', 'C65', 'B79', 'C116', 'E49', 'A11', 'C45', 'C101', 'C68', 'D7', 'D36']]), dof=1))
|   |-- (['<missing>', 'G6', 'C55 C57', 'C22 C26', 'A29', 'E77', 'C49'], {0: 127.0, 1: 209.0}, <Invalid Chaid Split> - the max depth has been reached)
|   +-- (['B35', 'C31', 'D20', 'F4', 'E8', 'B61', 'B45', 'C62 C64', 'E36', 'B42', 'F E69', 'D33', 'A16', 'B22', 'B49', 'C54', 'C85', 'C50', 'D11', 'D15', 'C130', 'E39 E41', 'C83', 'E101', 'B18', 'D37', 'C92', 'D30', 'E68', 'C105', 'D28', 'C99', 'B3', 'E34', 'C2', 'C93', 'D35', 'B80', 'C46', 'C90', 'F33', 'C23 C25 C27', 'B26', 'B4', 'B28', 'B39', 'E44', 'C32', 'B20', 'B41', 'E40', 'A34', 'B71', 'D', 'C126', 'B78', 'C97', 'D17', 'B73', 'D10 D12', 'B77', 'E50', 'D47', 'C7', 'E45', 'B5', 'B36', 'B96 B98', 'C86', 'B51 B53 B55', 'C80', 'B57 B59 B63 B66', 'B58 B60', 'C123', 'D19', 'E31', 'C89', 'C125', 'B69', 'E121', 'C78', 'E67', 'E33', 'C103', 'D21', 'C28', 'D9', 'C65', 'B79', 'C116', 'E49', 'A11', 'C45', 'C101', 'C68', 'D7', 'D36'], {0: 0, 1: 130.0}, <Invalid Chaid Split> - the max depth has been reached)
+-- (['male'], {0: 682.0, 1: 161.0}, (cabin, p=5.47806812834981e-53, score=234.7539634553321, groups=[['<missing>', 'C132', 'C31', 'C55 C57', 'D46', 'C62 C64', 'C128', 'E52', 'D6', 'B22', 'B10', 'C91', 'A7', 'C85', 'B86', 'E58', 'F G73', 'E38', 'C110', 'C95', 'C83', 'B24', 'D37', 'D34', 'D30', 'A10', 'C39', 'C2', 'D48', 'A32', 'E63', 'B94', 'C46', 'B37', 'A19', 'C23 C25 C27', 'B19', 'B11', 'A36', 'E44', 'C82', 'B102', 'F38', 'B71', 'D', 'C111', 'D43', 'B82 B84', 'B78', 'B30', 'B38', 'C6', 'D22', 'F E46', 'C86', 'C80', 'B58 B60', 'C123', 'C118', 'C124', 'C87', 'E31', 'A21', 'A24', 'F', 'C89', 'B69', 'E67', 'A18', 'E60', 'D21', 'D50', 'A14', 'C65', 'C30', 'A5', 'E46', 'T', 'C68', 'D26', 'F G63', 'C22 C26', 'B51 B53 B55', 'B57 B59 B63 B66', 'C78', 'C106', 'F2'], ['A6', 'F4', 'E8', 'B45', 'B52 B54 B56', 'E25', 'D33', 'C70', 'E24', 'D56', 'A20', 'B49', 'E12', 'C148', 'C92', 'D49', 'D38', 'C51', 'C47', 'E34', 'C93', 'D35', 'A23', 'D45', 'C52', 'D40', 'B20', 'B41', 'A34', 'C104', 'C126', 'F E57', 'C53', 'B101', 'D10 D12', 'E17', 'E50', 'A31', 'B96 B98', 'D19', 'E121', 'A26', 'A9', 'C116', 'E10', 'B50']]), dof=1))
    |-- (['<missing>', 'C132', 'C31', 'C55 C57', 'D46', 'C62 C64', 'C128', 'E52', 'D6', 'B22', 'B10', 'C91', 'A7', 'C85', 'B86', 'E58', 'F G73', 'E38', 'C110', 'C95', 'C83', 'B24', 'D37', 'D34', 'D30', 'A10', 'C39', 'C2', 'D48', 'A32', 'E63', 'B94', 'C46', 'B37', 'A19', 'C23 C25 C27', 'B19', 'B11', 'A36', 'E44', 'C82', 'B102', 'F38', 'B71', 'D', 'C111', 'D43', 'B82 B84', 'B78', 'B30', 'B38', 'C6', 'D22', 'F E46', 'C86', 'C80', 'B58 B60', 'C123', 'C118', 'C124', 'C87', 'E31', 'A21', 'A24', 'F', 'C89', 'B69', 'E67', 'A18', 'E60', 'D21', 'D50', 'A14', 'C65', 'C30', 'A5', 'E46', 'T', 'C68', 'D26', 'F G63', 'C22 C26', 'B51 B53 B55', 'B57 B59 B63 B66', 'C78', 'C106', 'F2'], {0: 682.0, 1: 109.0}, <Invalid Chaid Split> - the max depth has been reached)
    +-- (['A6', 'F4', 'E8', 'B45', 'B52 B54 B56', 'E25', 'D33', 'C70', 'E24', 'D56', 'A20', 'B49', 'E12', 'C148', 'C92', 'D49', 'D38', 'C51', 'C47', 'E34', 'C93', 'D35', 'A23', 'D45', 'C52', 'D40', 'B20', 'B41', 'A34', 'C104', 'C126', 'F E57', 'C53', 'B101', 'D10 D12', 'E17', 'E50', 'A31', 'B96 B98', 'D19', 'E121', 'A26', 'A9', 'C116', 'E10', 'B50'], {0: 0, 1: 52.0}, <Invalid Chaid Split> - the max depth has been reached)

But when i do

tree.render(path=None, view=False) tree.render(path="/image.png", view=False)

I recive:


ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 tree.render(path=None, view=False)

File [~/.local/share/virtualenvs/glue-python-hfM-ppjm/lib/python3.9/site-packages/CHAID/tree.py:291](https://file+.vscode-resource.vscode-cdn.net/Users/tomascortes/tomas-cache/datalake_notebooks/notebooks/~/.local/share/virtualenvs/glue-python-hfM-ppjm/lib/python3.9/site-packages/CHAID/tree.py:291), in Tree.render(self, path, view)
    290 def render(self, path=None, view=False):
--> 291     Graph(self).render(path, view)

File [~/.local/share/virtualenvs/glue-python-hfM-ppjm/lib/python3.9/site-packages/CHAID/graph.py:72](https://file+.vscode-resource.vscode-cdn.net/Users/tomascortes/tomas-cache/datalake_notebooks/notebooks/~/.local/share/virtualenvs/glue-python-hfM-ppjm/lib/python3.9/site-packages/CHAID/graph.py:72), in Graph.render(self, path, view)
     66 g = Digraph(
     67     format="png",
     68     graph_attr={"splines": "ortho"},
     69     node_attr={"shape": "plaintext", "labelloc": "b"},
     70 )
     71 for node in self.tree:
---> 72     image = self.bar_chart(node)
     73     g.node(str(node.node_id), image=image)
     74     if node.parent is not None:

File [~/.local/share/virtualenvs/glue-python-hfM-ppjm/lib/python3.9/site-packages/CHAID/graph.py:96](https://file+.vscode-resource.vscode-cdn.net/Users/tomascortes/tomas-cache/datalake_notebooks/notebooks/~/.local/share/virtualenvs/glue-python-hfM-ppjm/lib/python3.9/site-packages/CHAID/graph.py:96), in Graph.bar_chart(self, node)
     93     fig["data"].append(self._table(node))
     95 filename = os.path.join(self.tempdir, "node-{}.png".format(node.node_id))
---> 96 pio.write_image(fig, file=filename, format="png")
     97 return filename
...
    165     )
    167 img = response.get("result").encode("utf-8")
    169 # Base64 decode binary types

ValueError: Transform failed with error code 525: Failed to execute 'getPointAtLength' on 'SVGGeometryElement': The element's path is empty.
Rambatino commented 1 year ago

Sooo my actually worked, surprisingly. I set the print method to: image png

So looks like it's a version issue...

Rambatino commented 1 year ago

I removed cabin too as was taking too long...

But anyway

Let me print some versions of things

Rambatino commented 1 year ago
➜  endx git:(update_cloudflare) ✗ (⎈|arn:aws:eks:eu-west-1:126159260517:cluster/endx-prod-ek8s:default) pip3 list | grep -i 'numpy\|pandas\|psutil\|chaid\|graphviz'
WARNING: Skipping /opt/homebrew/lib/python3.11/site-packages/six-1.16.0-py3.11.egg-info due to invalid metadata entry 'name'
CHAID                     5.3.0
graphviz                  0.20.1
numpy                     1.24.2
pandas                    2.0.0
psutil                    5.9.4

Running python3.11

Rambatino commented 1 year ago

I had to unfortunately modify the chaid files to change:

np.int -> int np.float -> float

Rambatino commented 1 year ago

Also assumed you mean tree.render(path="./image.png", view=False) as opposed to tree.render(path="/image.png", view=False)

tomascortes commented 1 year ago

This worked!! Thanks!!

I updated to python 3.11 and changed:

np.int -> int np.float -> float

Thank you so much!! I'm very grateful. Your library is incredible.