amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
442 stars 107 forks source link

Warning message "A ridge penalty had to be used to calculate the inverse crossproduct of the predictor matrix. " #118

Closed markushaun closed 6 years ago

markushaun commented 6 years ago

Hello,

I am struggling with a problem which apparently has not been discussed online yet (Stack Overflow etc). Therefore, I would kindly ask if you could have a look at it.

The syntax reads:

esther_processed3 <- esther_processed2 %>% select(P02SEX, EDUC, age_T1, income_T1, longtermillness_T1, selfratedhealth_T1, livealone_T1, everdepression_T1, phq_dep_T1, phq_gad_T1, phq_dep_T2) esther.imp <- mice(esther_processed2, m=5, seed = 23145)

The error message than reads:

image

When conducting a detailed check of esther.imp$loggedEvents, I come across the following warning (example, appears at multiple points):

image

There are no duplicate variables or respondernt names/numbers in the dataset. Otherwise, I am not able to make any sense of this warning. I have tried to avoid labelling variables and dplyr manipulation which did not alter the outcome.

Your response would be of great help to me.

Best regards from Heidelberg, Markus Haun

gerkovink commented 6 years ago

You have created an exactly singular system which troubles parameter estimation. To still allow for parameters to be estimated, mice has automatically added a ridge penalty. This ridge penalty may in theory result in a very small amount of bias, but does effectively avoid the parameter estimation problem. The previous version of mice had ridge regression as it's default, but the current algorithm only falls back to ridge regression whenever it is needed.

The warning printed is to inform you of this process. It does not imply that your imputations are invalid; just that standard least squares parameters could not be obtained. Most often this is because of multicollinearity issues or a large set of factors in the model.

All the best,

Gerko

gerkovink commented 6 years ago

Dear Markus,

Would you agree that this issue can be closed as the message you experience seems to be related to your data and not to the code in mice?

If you need further help with identifying the problem; I'd be more than happy to help.

All the best,

Gerko

markushaun commented 6 years ago

Dear Gerko, thank you so much for your clarifying answer above. Concerning multicollinearity, at this point, I have tried the imputation having had removed a highly correlated predictor r=.66) before. There are indeed a lot of factor variables in the data set (6 out of 11). Would you advise me to reduce the number of factor variables considered for imputation? I would be grateful for your perspective. Otherwise, this issue can be closed. All the best, Markus

gerkovink commented 6 years ago

Yes. The more factors; the sparser the number of usable cases for the estimation procedures.