Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
149 stars 50 forks source link

Slow #125

Closed theinexorable closed 2 years ago

theinexorable commented 2 years ago

This algo works terribly slow and really needs binning of scale inputs. I love this algorithm, but can't really use it as it is. Needs more overall optimization.

Rambatino commented 2 years ago

We were achieving millions of rows with double digit columns in a few seconds. If you're trying to force scale dimensions through an algorithm that presumes categorical inputs then it will be slow as you're using it incorrectly.

Furthermore, the binning of inputs is an incredibly subjective process that requires an understanding of the data in order to bin. Bin outside the algorithm and use the algorithm as intended.