awslabs / datawig

Imputation of missing values in tables.
Apache License 2.0
478 stars 69 forks source link

ValueError: cannot convert float NaN to integer #137

Closed mathiasleroy closed 4 years ago

mathiasleroy commented 4 years ago

Hello, I get this error that I can't solve, using google colaboratory. I'm not sure if it's due to a wrong install or conflicting versions, my apologies if it is.

/usr/local/lib/python3.6/dist-packages/datawig/iterators.py in init(self, data_frame, data_columns, label_columns, batch_size) 229 # custom padding for having to discard the last batch in mxnet for sparse data 230 padding_n_rows = self._n_rows_padding(data_frame) --> 231 self.start_padding_idx = int(data_frame.index.max() + 1) 232 for idx in range(self.start_padding_idx, self.start_padding_idx + padding_n_rows): 233 data_frame.loc[idx, :] = data_frame.loc[self.start_padding_idx - 1, :]

ValueError: cannot convert float NaN to integer

My code :

!pip install datawig

import datawig, numpy

import pandas as pd

import sys
from io import StringIO

data="""epiU    epiPV   dsU dsPG    ifrU    ifrPG
874 1125    40  57      
815 1081    48  95      
712 937 39  53      
606 773 45  80      
576 721 38  52      
401 547 28  44  1040    1202
362 479 31  46  986 1139
295 361 29  42  909 1043
253 314 30  57  757 892
292 364 92  150 844 1018
253 311 18  43  765 921
214 263 14  24  681 808
198 248 16  26  645 752
161 199 10  24  562 654
"""

df = pd.read_csv(StringIO(data), sep="\t")
df = df[['epiU', 'dsU', 'ifrU']]

print(df.dtypes)
print(df)

df_imputed = datawig.SimpleImputer.complete(df)

EDIT: important note is that the basic example is working ok.

# generate some data with simple nonlinear dependency
df = datawig.utils.generate_df_numeric() 
# mask 10% of the values
df_with_missing = df.mask(numpy.random.rand(*df.shape) > .9)

# impute missing values
df_with_missing_imputed = datawig.SimpleImputer.complete(df_with_missing)

EDIT2

I think the issue is datawig requires pandas 0.25.3

!pip install datawig 
ERROR: google-colab 1.0.0 has requirement pandas~=1.0.0; python_version >= "3.0", but you'll have pandas 0.25.3 which is incompatible.
felixbiessmann commented 4 years ago

Closing this issue for now, I'm assuming the problem is solved once the required dependency versions are being used? Feel free to reopen otherwise.