Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
150 stars 50 forks source link

model_predictions fails with categorical dependant variables #87

Closed rleiva closed 5 years ago

rleiva commented 6 years ago

If the dependent variable is categorical, where categories are strings, the method model_predictions fails. The problem is that the the pred array is initialized as:

pred = np.zeros(self.data_size)

and that enforces predictions to be numerical. In order to solve that, the model_predictions could be rewritten to something like the following:

pred = [None] * self.data_size
for node in self:
    if node.is_terminal:
        max_val = max(node.members, key=node.members.get)
        for i in node.indices:
            pred[i] = max_val
return pred

Best regards

Rambatino commented 5 years ago

@rleiva very sorry for the delay, I've been away and forgot to check!

Will have a look at this now :)