Closed diogoalvesderesende closed 3 years ago
Hey thanks for your interest and for the issue you've found!
I have seen that error before.
Have you set up both the graphviz and orca as in here:
graphviz: https://stackoverflow.com/questions/35064304/runtimeerror-make-sure-the-graphviz-executables-are-on-your-systems-path-aft orca: https://github.com/plotly/orca
I have a similar problem.
For graphviz, I followed this video and got it to work. https://www.youtube.com/watch?v=kOYnlqbZ8K4 For orca, I was able to install it and verify that the orca executable is available on my path through command prompt.
However, even with those addressed, I am still unable to get the visual to work.
I'm still pretty new to Python, so I was going to ask if I'm supposed to put something specific in path = ? I tried putting the path to the folder where my data and .ipynb file is and it didn't work. I also tried putting the path to where my orca file is, but it didn't work.
This is what I get as my error:
tree.to_tree() <treelib.tree.Tree at 0x1fd4f262d90>
OSError Traceback (most recent call last)
So the glaring issue in the colab notebook is:
plotly.io.orca.config.executable = '/path/to/orca'
As /path/to/orca is the example rather than the actual path to orca.
I've recreated it locally, will see if fixing the path works.
This might be a silly misunderstanding on my end, but I tried getting the path through the desktop orca properties and through my anaconda folder, and still got errors (even if I take out the quotes, it doesn't work).
So it's not the program I don't think. It's the executable.
For my mac I installed orca using brew install orca
. I then ran orca and it moved the cli into my /usr/local/bin:
Therefore:
➜ ~ which orca
/usr/local/bin/orca
(I recreated the issue, then installed orca. When I opened the program it then moved the cli into /usr/local/bin)
Because it was in a standard path you don't need to specify the location of orca.
which orca
needs to return a path for unix systems (including mac). where orca
needs to return the correct path for windows systems
This might be a silly misunderstanding on my end, but I tried getting the path through the desktop orca properties and through my anaconda folder, and still got errors (even if I take out the quotes, it doesn't work).
I'm not exactly sure what the issue is there. Have you tried double quotes rather than single?
This might be a silly misunderstanding on my end, but I tried getting the path through the desktop orca properties and through my anaconda folder, and still got errors (even if I take out the quotes, it doesn't work).
I'm not exactly sure what the issue is there. Have you tried double quotes rather than single?
I have :/
the Unicodeescape error is because python is interpreting \U
as the start of a unicode literal. You need double backslashes to escape a backslash properly, or I think forward slashes work for paths most of the time on windows systems.
In [1]: print("C:\Users")
File "<ipython-input-16-83edfbd11c98>", line 1
print("C:\Users")
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
In [2]: print("C:\\Users")
C:\Users
I got the path to work, but now I'm back to the initial problem I had with the invalid argument in trees. -_- I tried to specify my path to somewhere on my computer, then ran into an access problem, which might not be worth investigating since I am just an intern...
Thank you for your responses! The CHAID package works really well for me otherwise and has been helpful.
@soonmi-m this seems to be a different problem to the orca issue above. I've created a new issue for you
@diogoalvesderesende let's carry on the orca issue here. The difficult thing is that you need to get it to install in colab, and also you need to make the visual in-line. I don't think it's a trivial problem, unfortunately.
Hey Mark,
Hey Mark, I have managed. Not able to plot in-line, but I take what I get. Thanks for the help! For Google Colab, you need to install this as well:
!pip install plotly>=4.0.0
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-4
Then for the visualization, this worked for me:
#Visualization
import orca
import plotly
import plotly.graph_objects as go
tree.render(path=None, view=True)
Again, great package. I have a few questions that hopefully you could help me to take it the next level:
1) If I factorize the predictors, would it also work? 2) Is there a way to customize the plot? I.e., increase font size.
I believe CHAID is one of the most underrated techniques out there. Thank you for your work!!
Best, Diogo
- Is there a way to customize the plot? I.e., increase font size.
Not currently, no. I haven't really put much thought into the plot. If you want to do a PR with increased font size (and any other cosmetic changes) I'll approve, merge and release.
- If I factorize the predictors, would it also work?
Maybe, I'm not quite sure what you mean
Hey Mark,
I mean if I were to use OneHotEncoder, or the factorize function from Pandas, would it also work? Currently, the examples are only with binary variables.
I would love to do a PR, but my Python, and knowledge of it is not good enough :/
Thanks and best, Diogo
On Thu, 18 Feb 2021 at 14:23, Mark Ramotowski notifications@github.com wrote:
- Is there a way to customize the plot? I.e., increase font size.
Not currently, no. I haven't really put much thought into the plot. If you want to do a PR with increased font size (and any other cosmetic changes) I'll approve, merge and release.
- If I factorize the predictors, would it also work?
Maybe, I'm not quite sure what you mean
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Rambatino/CHAID/issues/116#issuecomment-781340031, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTO75ALUCXKMRT5LDIDN7LS7UIDRANCNFSM4XR6YPVA .
The ChiSquare stats functions permits any number of categorical variables (it doesn't need to be binary), but the results are more difficult to interpret (and also the likelihood for sub combinations to be significant increases) - One hot encoding is really useful for giving easier to explain answers because everything is equally weighted as a yes/no binary variable (see my answer here for an easy way to do it in pandas: https://stackoverflow.com/a/52507931/1744107)
If you were to run your notebook locally, then you can edit the config here:
https://github.com/Rambatino/CHAID/blob/master/CHAID/graph.py#L28
And you can play around with these variables and it should change the output in the tree.
To make your changes available you'll need to pip install like here and point to your local modified version (and restart the notebook python kernel between changes):
If it looks better I'll 💯 approve and merge the PR
closing Issue as seems resolved
Hey,
Thanks for the library, it is fantastic to have a python version of CHAID!
I am having issues visualizing the tree model. I get some error about orca and I cannot find a way to solve. Would you have any idea on how to fix it?
Please find here the link to the script: https://colab.research.google.com/drive/1pteueOMAd_QhioL5Kw9FyfqmhaMpHoYi?usp=sharing
Thanks, Diogo