epsilon-machine / missingpy

Missing Data Imputation for Python
GNU General Public License v3.0
236 stars 63 forks source link

BUG: Fix identification of deserialized np.nan #15

Open rbowden91 opened 4 years ago

rbowden91 commented 4 years ago

I believe the bug is the same as:

https://github.com/scikit-learn/scikit-learn/issues/11462

Basically, after serializing and deserializing a KNN or Forest imputer, it fails to transform new data, crashing with:

File "/home/rbowden/.local/share/virtualenvs/qls_py-a_jz9n52/lib/python3.7/site-packages/missingpy/missforest.py", line 505, in transform force_all_finite=force_all_finite, copy=self.copy) File "/home/rbowden/.local/share/virtualenvs/qls_py-a_jz9n52/lib/python3.7/site-packages/sklearn/utils/validation.py", line 542, in check_array allow_nan=force_all_finite == 'allow-nan') File "/home/rbowden/.local/share/virtualenvs/qls_py-a_jz9n52/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite raise ValueError(msg_err.format(type_err, X.dtype)) ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

The temporary fix in my code had been:

imputer.missing_values = np.nan

But I believe this patch fixes the issue within missingpy itself (or at least, fixes that particular issue on my end).