values not imputed #17

Open nick-youngblut opened 2 years ago

nick-youngblut commented 2 years ago

I'm essentially running the demo code, but with my own input data (all numeric data), and the data frames generated by imputer.generate_samples(m=10).output_list still have the same missing values as in the input.

Example input table:

Feature     feat1  feat2  feat3  ...  feat30  feat31  feat32
ERS2551628                65.0         0.0             101.0  ...            105.0                 230.0                27.0
SRS143466                 43.0         NaN              34.0  ...             98.0                   0.0                26.0
SRS023715                  0.0        54.0               0.0  ...             33.0                  55.0                 NaN
SRS580227                  0.0         0.0              10.0  ...             67.0                  22.0                 0.0
DRS091214             327457.0         0.0               NaN  ...              NaN                   0.0                24.0
...                        ...         ...               ...  ...              ...                   ...                 ...
ERS2551594                74.0        15.0              21.0  ...             93.0                  40.0                 0.0
ERS634957                  0.0        12.0               0.0  ...              0.0                  45.0                 0.0
DRS087574                  0.0        80.0              43.0  ...            209.0                   NaN                12.0
ERS634952                 33.0        56.0              11.0  ...              NaN                1032.0                 0.0
SRS1820544                49.0       102.0              12.0  ...             13.0                  27.0                49.0

...and the output:

Feature     feat1  feat2  feat3  ...  feat30  feat31  feat32
ERS2551628                65.0         0.0             101.0  ...            105.0                 230.0                27.0
SRS143466                 43.0         NaN              34.0  ...             98.0                   0.0                26.0
SRS023715                  0.0        54.0               0.0  ...             33.0                  55.0                 NaN
SRS580227                  0.0         0.0              10.0  ...             67.0                  22.0                 0.0
DRS091214             327457.0         0.0               NaN  ...              NaN                   0.0                24.0
...                        ...         ...               ...  ...              ...                   ...                 ...
ERS2551594                74.0        15.0              21.0  ...             93.0                  40.0                 0.0
ERS634957                  0.0        12.0               0.0  ...              0.0                  45.0                 0.0
DRS087574                  0.0        80.0              43.0  ...            209.0                   NaN                12.0
ERS634952                 33.0        56.0              11.0  ...              NaN                1032.0                 0.0
SRS1820544                49.0       102.0              12.0  ...             13.0                  27.0                49.0

Any idea on why the missing values are not imputed?

conda env

nick-youngblut commented 2 years ago

Dropping the index for the input dataframe fixed the issue. It appears that the index must be the standard 0:(nrow-1)

ErnestJohnston commented 9 months ago

Thanks, this fix also helped me. I had to reset the index values to default before it would impute the missing values in the pandas dataframe. Can this be added to the documentation?