OHDSI / FeatureExtraction

An R package for generating features (covariates) for a cohort using data in the Common Data Model.
http://ohdsi.github.io/FeatureExtraction/
61 stars 60 forks source link

CVX Codes Excluded from Covariates #101

Open cukarthik opened 4 years ago

cukarthik commented 4 years ago

We've noticed that vaccines coded with CVX codes are not included as covariates. I'm still trying to understand the code, but I'm wonder if it is due to this query or this one.

cukarthik commented 4 years ago

@schuemie , if you think this is correct, I can try to create a PR if you'd like.

schuemie commented 4 years ago

The first query your referenced is for emulating the HDPS algorithm, which I hope nobody uses outside of method evaluation. The second query is indeed the query used to construct the drug covariates, but I'm not sure what the issue is.

What specifically is the problem with the vaccines? Are they not standard concepts? Are the concepts not in the Drug domain? Are they not in the Drug_era table?

cukarthik commented 4 years ago

The issue is that CVX codes are standard codes; however, they are not part of the ATC hierarchy so all CVX coded vaccines in the drug table are not included as a covariate in feature extraction based on my understanding of the query. most of our vaccines are all coded in CVX codes and I would imagine many EHR based databases in the US are as well, at least for vaccines prior to 2018..

@cgreich can correct me on CVX not being under ATC.

mattspotnitz commented 4 years ago

I would like to add that vaccine data on some claims databases are not being included because of this issue.

cukarthik commented 4 years ago

Hi @schuemie ,

We did some digging and looked at the volume of CVX codes in our databases (both claims and ehr) as seen below. We found that CVX codes are present a lot for vaccines (we searched for the word vaccine for RxNorm concepts - thanks @aostropolets). Considering the high frequency of CVX codes are in the databases, I would expect them to show up as a covariate in the propensity model; however, in one of our studies we are seeing that HPV vaccines are not showing up as a covariate when we would expect it. The only other place I see CVX codes potentially excluded is here, but I'm not sure if that used in the propensity model. Anyway, if you point me in the right direction of what query needs to be adjusted, I can make the change and create a pull request, assuming it's a sql issue.

record_count vocabulary_id database
14,342,170 RxNorm CCAE
165,986,759 CVX CCAE
4,155,058 RxNorm MDCD
67,567,343 CVX MDCD
1,217,238 RxNorm MDCR
8,339,518 CVX MDCR
402,354 RxNorm 2018q4
3,297,009 CVX 2018q4

@pbr6cornell @aostropolets

dimshitc commented 3 years ago

We'are still working on the CVX hierarchy. We expect that all vaccines will roll up to some ATC code (through RxNorm or via direct hierarchical relationship to ATC), so these ATC or RxNorm concepts would be treated as a feature.