Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
149 stars 50 forks source link

Feature selection difference with SPSS #123

Closed XinyuanHu closed 2 years ago

XinyuanHu commented 2 years ago

I have a question regarding the chaid implementation in this package and SPSS.

I compared the output from this package and from SPSS. The later appeared to have some sort of selection built in because it only uses some of the features from the entire data set. Versus this package uses a lot more features.

When I limit the python test to same features SPSS selected, it returns the same tree/rules.

Do you know how or why this happens? My need is to try to recreate the SPSS version as much as possible.

Thank you!

Rambatino commented 2 years ago

Can you give an example please? There is an addendum in the README too