amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
433 stars 107 forks source link

Why does the problem of circularity during passive imputation lead to strange imputations and non-convergence? #472

Closed jeyabbalas closed 2 years ago

jeyabbalas commented 2 years ago

Thank you to this community for building such a comprehensive and useful tool!

I went through the MICE vignettes and had a question from it. In vignette 4 "Passive imputation and post-processing", please scroll down to step 8. Pay attention to the figure in that step. This vignette warns us against circumstances that can lead to such strange imputations (BMIs well over 100s). The vignette describes this as "the problem of circularity", which I have summarized below.

The example introduces a passive imputation rule for imputing BMI values as follows:

meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)"
imp <- mice(boys, meth=meth, print=FALSE)

So, the variable bmi now depends upon variables wgt and hgt. However, from the prediction matrix, we see that variables wgt and hgt still cyclicly depend upon bmi. This is described as the problem of circularity.

pred<-ini$pred
pred

##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   1  1   1   1  1   1
## wgt   1   1   0   1  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

The suggested fix is simple. Simply break the cycle. Make sure that wgt and hgt no longer depends upon bmi, as shown below.

pred[c("hgt", "wgt"), "bmi"] <- 0

I am unable to understand why the problem of circularity leads to such strange imputations and non-convergence? When I use the default PMM imputation method, I always impute realistic values for hgt and wgt sampled directly from similar examples in the data. How then can I get such bizarre BMIs like in the 300s? Why does this problem not manifest when we don't use passive imputation but have cyclic dependencies?

gerkovink commented 2 years ago

Let's assume that you first impute a large hgt based on the predictors. Since hgt and wgt are strongly correlated, a large wgt is likely to be imputed. Then, bmi is deductively calculated, and the cycle reiterates. A large BMI will lead to a specific hgt/wgt ratio, and so on. Because there are too many degrees of freedom, the system can in theory go into outer space for one or more chains.

jeyabbalas commented 2 years ago

Thank you for your prompt reply @gerkovink !

How is it even possible to even get a BMI > 300 using PMM? PMM imputes real values from the data for hgt and wgt in the above scenario. In the boys dataset, the max value of hgt = 198 and wgt = 117.8. So, in theory, the largest possible value for BMI can be: bmi = (wgt / (hgt / 100)^2) = (117.4 / (198 / 100)^2) = 29.95. How do we get such large values imputed for bmi?

gerkovink commented 2 years ago

Passive imputation does calculate the imputed value for BMI in your example deductively, not by PMM.

On Sat, Feb 26, 2022, 6:49 PM Jeya Balaji Balasubramanian < @.***> wrote:

Thank you for your prompt reply @gerkovink https://github.com/gerkovink !

How is it even possible to even get a BMI > 300 using PMM? PMM imputes real values from the data for hgt and 'wgt' in the above scenario. In the boys dataset, the max value of hgt = 198 and wgt = 117.8. So, in theory, the largest possible value for BMI can be: bmi = (wgt / (hgt / 100)^2) = (117.8 / (198 / 100)^2) = 29.95. How do we get such large values imputed for bmi?

— Reply to this email directly, view it on GitHub https://github.com/amices/mice/issues/472#issuecomment-1052375949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABT2AKAWHR54LSSTUJN55GTU5EHB5ANCNFSM5PLBL3FA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

jeyabbalas commented 2 years ago

Right, but wgt and hgt is still being imputed using PMM, right? In my previous reply, I am computing BMI deductively using the formula for BMI.

gerkovink commented 2 years ago

But hgt and wgt can be imputed such that the highest wgt and lowest hgt lead to unrealistically large BMI; simply because their imputations are influenced by the unrestricted BMI from the previous iteration.

On Sat, Feb 26, 2022, 7:26 PM Jeya Balaji Balasubramanian < @.***> wrote:

Right, but wgt and hgt is still being imputed using PMM, right? In my previous reply, I am computing BMI deductively using the formula for BMI.

— Reply to this email directly, view it on GitHub https://github.com/amices/mice/issues/472#issuecomment-1052421623, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABT2AKE6P56RSEFCZZJYFHTU5ELNDANCNFSM5PLBL3FA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

jeyabbalas commented 2 years ago

Oh I see what you mean! So, due to the large degrees of freedom of bmi, a large bmi sends hgt in the other direction. The min value of hgt is 50. In that case, bmi = (wgt / (hgt / 100)^2) = (117.4 / (50 / 100)^2) = 469.6. This is consistent with the numbers I am seeing.

This makes sense. Thank you so much!