amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

mice() Different results depending on column order #541

Closed jay-sf closed 1 year ago

jay-sf commented 1 year ago

Currently, when we impute datasets, the results are different depending on the column order.

> library(mice)
> summary(pool(with(mice(nhanes2, seed=42, printFlag=FALSE), lm(chl ~ age + hyp + bmi))))
         term  estimate std.error  statistic        df    p.value
1 (Intercept) -7.041069 68.984151 -0.1020679 10.048167 0.92071091
2    age40-59 44.738210 22.610459  1.9786511 12.654316 0.07004070
3    age60-99 71.304537 27.001228  2.6407887  9.457629 0.02581795
4      hypyes -5.289385 22.824540 -0.2317411 10.452068 0.82121333
5         bmi  6.511697  2.560452  2.5431830  8.172416 0.03396548
> summary(pool(with(mice(nhanes2[, c(2, 4, 1, 3)], seed=42, printFlag=FALSE), lm(chl ~ age + hyp + bmi))))
         term  estimate std.error   statistic        df    p.value
1 (Intercept) 41.897815 65.434937  0.64029732 11.175928 0.53488653
2    age40-59 46.541989 22.261161  2.09072602 13.942022 0.05535303
3    age60-99 62.623802 25.442422  2.46139313 11.357257 0.03097931
4      hypyes -3.134964 32.152478 -0.09750303  4.631723 0.92640397
5         bmi  4.621302  2.440856  1.89331180  9.160326 0.09028175

It is suspected that the internal use of seeds might be the reason for this. Also see related Q&A on Stack Overflow.

thomvolker commented 1 year ago

See https://stackoverflow.com/a/75694675/21369628 for an explanation. This is expected behavior, so the issue can be closed.