Closed tammandres closed 2 years ago
Great catch, here's what's going on. Making the training feature set sorts the features by index. However, getting bachelor predictions doesn't..
I will force the variable schema to sort the predictor variables, if they are passed. Good catch though.
I pushed a fix which you can download from this repo. I plan on pushing this to pypi later tonight. It fixed the problem this code was causing specifically, if you are still having problems in your real code, let me know.
Thank you for your quick reply and help! I was just trying to get around this issue by making the columns in my variable schema to have the same order as in the original dataframe. I initially stumbled on this, because I used np.setdiff1d to remove the feature itself from the list of predictor variables, but setdiff sorts the results ... I am really glad this issue is clear before continuing my analysis of a healthcare dataset! 😅
For future reference, variable_schema will by default use all other columns to predict each variable that has missing values. If you are going this route, it would probably be easier to just leave variable_schema as None.
It is good to know that, though I still wanted to use the schema, as I wanted to exclude a subset of columns from the imputation models
Hi,
I noticed an unexpected imputation behaviour that can be illustrated with the following example:
The order of f1 and f2 should not matter, because there are no missing values in these features. It should also not matter because f1 and f2 carry no information about f0. I wonder if this could be a bug? I hope I have not overlooked anything obvious!
Thanks, Andres
Code to illustrate this:
Output I get on my computer (without plots):