Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
149 stars 50 forks source link

RuntimeWarning: invalid value encountered in cast #137

Closed jihaekor closed 10 months ago

jihaekor commented 10 months ago

It looks like I get this runtime warning on this line in OrdinalColumn:

self._nan = np.array([np.nan]).astype(int)[0]

Right now, it seems like for the majority of the platforms, this will end up returning the smallest possible integer value, but that may not be consistent across all. What is the intended usage of this variable? Should it just be more explicitly set to something like:

np.iinfo(np.int64).min
Rambatino commented 10 months ago

Hi @jihaekor - I'm not so sure. Have asked the original author:

https://github.com/Rambatino/CHAID/pull/45/files#r1315790434

jihaekor commented 10 months ago

@Rambatino @xulaus Thanks for this - based on the runtime warning and what I have read, I think this is working via some sort of a side effect rather than an actual missing value representation; see this issue from numpy: https://github.com/numpy/numpy/issues/21166

I think this should just pick a sentinel value (my suggestion above would be consistent with what is being used today in most platforms) and do an explicit replacement rather than continuing to rely on this.

If you agree, I can push a quick PR to make this update as well - let me know!

xulaus commented 10 months ago

If the value of the cast is no longer de facto reliable then moving to an explicit sentinel value makes a lot of sense. Having missing data represented by 0 is a non starter as that was pretty a common value in ordinals from what I remember, and if that is what is happening on newer macs then it needs to be fixed.

My head is pretty out of CHAID these days, but I think using np.iinfo(np.int64).min directly rather than relying on the cast should be a fairly straightforward swap

jihaekor commented 10 months ago

@xulaus @Rambatino I pushed a quick PR https://github.com/Rambatino/CHAID/pull/138 regarding this. Feel free to have a look and let me know if you see any issues!