amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
447 stars 108 forks source link

Write columns of the imputed data in the same sequence as the incomplete data (long form) #569

Closed stefvanbuuren closed 1 year ago

stefvanbuuren commented 1 year ago

The call complete(imp, action = "long", include = TRUE) exports the imputed data in long form with (m + 1) * n records. The exported data contains two new variables with names:

Until now, the new variables are written to columns 1 and 2 of the data. The disadvantage is that this changes the column positions as found in the original data imp$data.

This PR writes the two new variables as the last two variables. In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with.

Commit cdb8bcf3b7923867226e3ebfc7375b288707f03d solves a problem with complete() that prevented proper transfer of the type of .id variable (integer or character).

Commit ca1e876bbede3658ad068f2c518920e424f8e049 changes the column order in complete() and adapts functions that silently assumed that .imp and .id would be in columns 1 and 2, respectively (e.g. in plots and tests).

Note that any existing code that assumes that variables ".imp" and ".id" are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names ".imp" and ".id".

stefvanbuuren commented 1 year ago

Good suggestions:

  1. I have added an order argument to complete() to support old code.

  2. Important point, but unfortunately I cannot change the default action in complete(). The function is widely used in mice and outside, with a history of over 20 years. I programmed the warning, but the result was ugly, with distracting warning messages popping up from various functions. I do not wish to punish savvy users, so I removed it.

  3. The glitch was indeed present on 3.16.0, and I also saw it. Here's what it does now:

library(mice, warn.conflicts = FALSE)
head(rownames(boys))
#> [1] "3"  "4"  "18" "23" "28" "36"
head(attr(boys, 'row.names'))
#> [1]  3  4 18 23 28 36
imp <- mice(boys, printFlag = FALSE)
long <- complete(imp, "long", include = TRUE)
head(long$.id)
#> [1]  3  4 18 23 28 36
imp2 <- as.mids(long)
head(rownames(imp2$data))
#> [1] "3"  "4"  "18" "23" "28" "36"
head(attr(imp2$data, "row.names"))
#> [1]  3  4 18 23 28 36

Created on 2023-07-20 with reprex v2.0.2

Thanks for the feedback.