MIDASverse / MIDASpy

Python package for missing-data imputation with deep learning
Apache License 2.0
128 stars 36 forks source link

values not imputed #17

Open nick-youngblut opened 2 years ago

nick-youngblut commented 2 years ago

I'm essentially running the demo code, but with my own input data (all numeric data), and the data frames generated by imputer.generate_samples(m=10).output_list still have the same missing values as in the input.

Example input table:

Feature     feat1  feat2  feat3  ...  feat30  feat31  feat32
ERS2551628                65.0         0.0             101.0  ...            105.0                 230.0                27.0
SRS143466                 43.0         NaN              34.0  ...             98.0                   0.0                26.0
SRS023715                  0.0        54.0               0.0  ...             33.0                  55.0                 NaN
SRS580227                  0.0         0.0              10.0  ...             67.0                  22.0                 0.0
DRS091214             327457.0         0.0               NaN  ...              NaN                   0.0                24.0
...                        ...         ...               ...  ...              ...                   ...                 ...
ERS2551594                74.0        15.0              21.0  ...             93.0                  40.0                 0.0
ERS634957                  0.0        12.0               0.0  ...              0.0                  45.0                 0.0
DRS087574                  0.0        80.0              43.0  ...            209.0                   NaN                12.0
ERS634952                 33.0        56.0              11.0  ...              NaN                1032.0                 0.0
SRS1820544                49.0       102.0              12.0  ...             13.0                  27.0                49.0

...and the output:

Feature     feat1  feat2  feat3  ...  feat30  feat31  feat32
ERS2551628                65.0         0.0             101.0  ...            105.0                 230.0                27.0
SRS143466                 43.0         NaN              34.0  ...             98.0                   0.0                26.0
SRS023715                  0.0        54.0               0.0  ...             33.0                  55.0                 NaN
SRS580227                  0.0         0.0              10.0  ...             67.0                  22.0                 0.0
DRS091214             327457.0         0.0               NaN  ...              NaN                   0.0                24.0
...                        ...         ...               ...  ...              ...                   ...                 ...
ERS2551594                74.0        15.0              21.0  ...             93.0                  40.0                 0.0
ERS634957                  0.0        12.0               0.0  ...              0.0                  45.0                 0.0
DRS087574                  0.0        80.0              43.0  ...            209.0                   NaN                12.0
ERS634952                 33.0        56.0              11.0  ...              NaN                1032.0                 0.0
SRS1820544                49.0       102.0              12.0  ...             13.0                  27.0                49.0

Any idea on why the missing values are not imputed?

conda env

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_tflow_select             2.3.0                       mkl
absl-py                   0.15.0                   pypi_0    pypi
aiohttp                   3.8.1            py39h3811e60_0    conda-forge
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
astor                     0.8.1              pyh9f0ad1d_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
blas                      1.1                    openblas    conda-forge
blinker                   1.4                        py_1    conda-forge
brotlipy                  0.7.0           py39h3811e60_1003    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2021.10.26           h06a4308_2
cachetools                4.2.4              pyhd8ed1ab_0    conda-forge
certifi                   2021.10.8        py39hf3d152e_1    conda-forge
cffi                      1.15.0           py39h4bc2ebd_0    conda-forge
charset-normalizer        2.0.9              pyhd8ed1ab_0    conda-forge
click                     8.0.3            py39hf3d152e_1    conda-forge
cryptography              36.0.0           py39h9ce1e76_0
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
flatbuffers               1.12                     pypi_0    pypi
freetype                  2.11.0               h70c0345_0
frozenlist                1.2.0            py39h3811e60_1    conda-forge
gast                      0.3.3                    pypi_0    pypi
google-auth               1.35.0                   pypi_0    pypi
google-auth-oauthlib      0.4.1                      py_2    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpcio                    1.32.0                   pypi_0    pypi
h5py                      2.10.0          nompi_py39h98ba4bc_106    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
idna                      3.3                pyhd3eb1b0_0
importlib-metadata        4.10.0           py39hf3d152e_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
joblib                    1.1.0                    pypi_0    pypi
jpeg                      9d                   h516909a_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
kiwisolver                1.3.2            py39h1a9c180_1    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           1_h6e990d7_netlib    conda-forge
libcblas                  3.9.0           3_h893e4fe_netlib    conda-forge
libdeflate                1.8                  h7f98852_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
liblapack                 3.9.0           3_h893e4fe_netlib    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.13               h4367d64_0
libpng                    1.6.37               hed695b0_2    conda-forge
libprotobuf               3.19.2               h780b84a_0    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libtiff                   4.3.0                h6f004c6_2    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markdown                  3.3.6              pyhd8ed1ab_0    conda-forge
matplotlib                3.3.2                         0    conda-forge
matplotlib-base           3.3.2            py39h98787fa_1    conda-forge
midaspy                   1.2.1                    pypi_0    pypi
multidict                 5.2.0            py39h3811e60_1    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.19.5                   pypi_0    pypi
oauthlib                  3.1.1              pyhd8ed1ab_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openblas                  0.3.4             h9ac9557_1000    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   3.0.0                h7f98852_2    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
pandas                    1.3.5            py39hde0f152_0    conda-forge
patsy                     0.5.2              pyhd8ed1ab_0    conda-forge
pillow                    8.4.0            py39ha612740_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
protobuf                  3.19.2           py39he80948d_0    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.8                      py_0
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.3.0              pyhd8ed1ab_1    conda-forge
pyopenssl                 21.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.6              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1            py39hf3d152e_4    conda-forge
python                    3.9.9           h543edf9_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
requests                  2.27.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.0              pyh9f0ad1d_0    conda-forge
rsa                       4.8                pyhd8ed1ab_0    conda-forge
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.7.1            py39hc65b3f8_2
setuptools                60.2.0           py39hf3d152e_0    conda-forge
six                       1.15.0                   pypi_0    pypi
sqlite                    3.37.0               h9cd32fc_0    conda-forge
statsmodels               0.13.1           py39hce5d2b2_0    conda-forge
tensorboard               2.6.0                      py_0
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.4.1           mkl_py39h4683426_0
tensorflow-addons         0.15.0                   pypi_0    pypi
tensorflow-base           2.4.1           mkl_py39h43e0292_0
tensorflow-estimator      2.4.0                    pypi_0    pypi
termcolor                 1.1.0                      py_2    conda-forge
threadpoolctl             3.0.0                    pypi_0    pypi
tk                        8.6.11               h27826a3_1    conda-forge
tornado                   6.1              py39h3811e60_2    conda-forge
typeguard                 2.13.3                   pypi_0    pypi
typing-extensions         3.7.4.3                  pypi_0    pypi
tzdata                    2021e                he74cb21_0    conda-forge
urllib3                   1.26.7             pyhd8ed1ab_0    conda-forge
werkzeug                  2.0.2              pyhd3eb1b0_0
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h516909a_1    conda-forge
yarl                      1.7.2            py39h3811e60_1    conda-forge
zipp                      3.6.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.1                ha95c52a_0    conda-forge
nick-youngblut commented 2 years ago

Dropping the index for the input dataframe fixed the issue. It appears that the index must be the standard 0:(nrow-1)

ErnestJohnston commented 9 months ago

Thanks, this fix also helped me. I had to reset the index values to default before it would impute the missing values in the pandas dataframe. Can this be added to the documentation?