Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
150 stars 50 forks source link

Why isn't there a predict function ? #72

Closed vsharathchandra closed 6 years ago

vsharathchandra commented 7 years ago

I split the data into train and test data.I have run the chaid on train data,Now I would like to use it to predict the output of test data.I want to do this for classification.

Rambatino commented 7 years ago

Hmm I do remember doing something with this, but at the moment it doesn't appear to be in the main library.

I'll have a look at the work I did a while ago, but it won't be production ready any time soon.

You're of course welcome to build it out yourself

Rambatino commented 7 years ago

@coderking7 are you able to have a try with :point_up:

Rambatino commented 7 years ago

@coderking7 please test on that branch with:

python -m CHAID tests/data/titanic.csv survived sex embarked --max-depth 77 --min-parent-node-size 1 --alpha-merge 0.6 --min-child-node-size 10 --accuracy
jm2909 commented 6 years ago

I have tried a basic implementation of the predict function for my own project. This is the link of the code: https://github.com/jm2909/CHAID-Predict-Function-Testing.git

I am a new guy in python. Any suggestions or critics will be appreciated, if there is any implementation mistake.

Rambatino commented 6 years ago

@jm2909 I unfortunately don't have too much time to go and read so much code! However, if it does what you need then great! 👍

Feel free to for CHAID and contribute if it works and has a sufficiently acceptable API.

I'm going to close this due to inactivity now.

VivianMagri commented 5 years ago

I have recently started working with analytics and I came here looking for a model to do the same as vsharathchandra described. When I couldn't find on documentation anything of the sort of a method 'predict', I came to the issues and discovered it really wasn't thought for this goal. Pardon my ignorance, but now I really am confused. What was the intended use of this library? I had the idea that the purpose of a decision tree (or supervised machine learning models in general) where always making prediction on unseen data.

Rambatino commented 5 years ago

The intended use of this library is to run it on a set of data and it will tell you what independent variables have a large affect on an outcome variable. E.g. Males under 40 really like drinking or whatever. It originally wasn't intended to make predictions on unseen data - it's more classical statistics [apply model to data].

Have a look at XGBoost Classifier and Regressor if that's the functionality you want - it will give more accurate predictions than CHAID will