OHDSI / StudyProtocolSandbox

This repository is for developing study packages for OHDSI studies. Once completed, they can be moved to the StudyProtocols repository.
32 stars 40 forks source link

High correlation error ( It's a different problem than the one below.) #43

Open SungJun9212 opened 6 years ago

SungJun9212 commented 6 years ago

I was analyzing tevofovirVsEntecavir in Rstudio.

I encountered an error while doing execute

this is error.

High correlation between covariate(s) and treatment detected. Perhaps you forgot to exclude part of the exposure definition from the covariates?" when using argument(s): list(cohortMethodDataFolder = "/home/user..... ... ... Error in ParallelLogger::clusterApply(cluster, modelsToFit, fitSharedPsModel, : Error(s) when calling function 'fun', see earlier messages for details

how to solve? help..

schuemie commented 6 years ago

I'm not familiar with that particular study, but the error message is telling you there are baseline characteristics that (nearly) perfectly predict which treatment a person is going to get. There are two possible reasons:

  1. Something went wrong when excluding the exposures themselves from the propensity model. For example, you could have drug codes in your data that, because of issues in the vocabulary, do not roll up to the ingredient concept IDs. Or, the exposures require some other event, for example if one of your treatments is an injectable it may always be accompanied by an injection procedure code.

  2. There is a real difference between the target and comparator group.

If it's the 1st reason, you may want to fix the problem by additionally excluding the offending covariates from the propensity model. If it's reason 2 you are probably out of luck; your propensity score is telling you (at an early stage) that the two groups are incomparable, and no amount of adjustment will make them similar enough. You could still try excluding the offending covariates if you're for some reason convinced they're not confounders. In my experience that usually just leads to other covariates becoming new problems, because fundamentally you just really shouldn't do the comparison.

The way to find out more is to look at those covariates. Currently you can't see because the error occurred while using multi-threading. If the study package had logging turned on, you can see the list of covariates in the log file. If not, you can either turn on logging (see here how to do that), or turn off multi-threading (by setting the number of cores to use to 1), and run again.