MIDASverse / MIDASpy

Python package for missing-data imputation with deep learning
Apache License 2.0
125 stars 35 forks source link

Impute new data using trained model. #11

Open mabdelhack opened 3 years ago

mabdelhack commented 3 years ago

Looking at the codebase I could not locate a function where the trained model could be used to impute new data after training the model. There seems to be a couple of functions that could be utilized to perform this indirectly but I am surprised that is not included as a separate function.

tsrobinson commented 3 years ago

Hi @mabdelhack -- you can impute data after training the model using the .generate_samples() function which saves m imputed datasets to the .output_list attribute.

If you are referring to entirely new data (i.e. a completely separate test dataset), we do not currently have this functionality. For the purposes of imputation, we prefer to use denoising and dropout as a means of regularization over conventional test-train splits.

We can consider this as an extension, and I'd be interested to know in what imputation circumstances this might be useful?

muhammad92syahrul commented 2 years ago

I am wondering that too. it will be useful when applying the cross validation like iterativeimputer. I hope you will consider to add that function.

Thank you

tsrobinson commented 2 years ago

Hi @muhammad92syahrul and @mabdelhack!

We still don't have a specific function for your purposes, but I wanted to flag the .change_imputation_target() method (found here in the source code). This method allows you to fit a model on $X$ as standard, change the imputation target to some new data $X'$, then sample completed datasets from $X'$ by calling .generate_samples() afterwards.

For cross-validation purposes, this does seem like a reasonable use case and I'll think more about supporting it more widely. But hopefully in the meantime this function may give you the functionality you require (it is, however, only very lightly tested).