Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
149 stars 50 forks source link

Exhaustive chaid #112

Closed KamilGos closed 3 years ago

KamilGos commented 3 years ago

What should be changed in Your code to get an "exhaustive CHAID" version of this algorithm?

Here is the explanation: ftp://ftp.software.ibm.com/software/analytics/spss/support/Stats/Docs/Statistics/Algorithms/13.0/TREE-CHAID.pdf

Rambatino commented 3 years ago

Hi @KamilGos, the only difference is the merging of the predictors right? So that means it's in the best_splits() method, which would exhaustively find the best pair.

The variable would need to be passed down, or it could even be via a new ExhaustiveCHAID class. Either way, shouldn't be too difficult.

KamilGos commented 3 years ago

Yes, that is right.

Ok, I will try then. It is just a little difficult for me to understand Your _best_consplit() function (I work witch continuous dependent variable) so I don't know where should I put the changes actually, but I will try. Thanks.

Rambatino commented 3 years ago

So, there are multiple ways to do this, I don't know off the top of my head which would be better (it's been a few years since I've looked into the code as it's been production grade for a long time).

The best_con_split() is a little verbose. But I wonder whether ind_var.possible_groupings(exhaustive=True) could be an easy solution (and then finding a way of passing that variable down).

There's also room for using other / more appropriate to your use-case stats functions when comparing two continuous sets of variables. Not sure whether you've thought about that side of things?

KamilGos commented 3 years ago

I'll try to use your idea.

Regarding your question, when I was working with your code using simple CHAID method it was working pretty fine (exactly as expected) so I didn't think about using other statistical functions. Now I'm working with ex-chaid and I thought I would just make a small change to the code and it would work but I'm stuck analyzing the function I mentioned earlier. But yes, you may be right, there would probably be a more appropriate approach (stats function) to do that. For now, I will try with what I have :)

Thanks

Rambatino commented 3 years ago

Yeah that function could probably be dryed up a bit.

Let me know how you get on, I may have time to look at some point this week, if you aren't able to crack it.

(although the unit tests should be useful for you)

KamilGos commented 3 years ago

Hi, I still can't handle it. If you could try to make this modification I will very gratefull ! Thanks for the thought anyway.

Rambatino commented 3 years ago

@KamilGos can you have a look at: https://github.com/Rambatino/CHAID/pull/113 🙏

KamilGos commented 3 years ago

@KamilGos can you have a look at: #113 🙏

It works as expected. Thank you so much!!! I very appreciate your effort.