bcbi / PreprocessMD.jl

Medically-informed data preprocessing for machine learning
MIT License
6 stars 3 forks source link

Add function to generate table of descriptive statistics #196

Open AshlinHarris opened 1 year ago

AshlinHarris commented 1 year ago

Input: a concept group c

Output:

AshlinHarris commented 1 year ago

The function should be run on

What if individuals have received more than one vaccine type?

AshlinHarris commented 1 year ago

Quote from https://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:gender:

The Gender domain captures all concepts about the sex of a person, denoting the biological and physiological characteristics. In fact, the Domain (and field in the PERSON table) should probably should be called “sex” rather than “gender”, as gender refers to behaviors, roles, expectations, and activities in society.

The domain contains only two standard concepts: FEMALE (concept_id=8532) and MALE (concept_id=8507). Many data sources contain other codes, such as “Unknown”, “Refused to tell”, “Hermaphrodite”, as well as transgender constellations (“male to female”, etc.). For the current purposes of the OMOP CDM, the gender concepts are used to stratify patients by their biological make-up or to adjust analytical results for the influence of the biological sex. Therefore, all those other genders are denoted as concept_id=0 (unknown information).

Quote from https://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:ethnicity:

The race field contains races and ethnic backgrounds, while for Ethnicity there are only two categories for data on ethnicity: “Hispanic or Latino” (concept_id=38003563) and “Not Hispanic or Latino” (concept_id=38003564). This means, the two categories are orthogonal to each other, and both Latinos and non-Latinos can have any racial or ethnic background.

This is a very US-centric solution, and hence the terminology might be confusing to non-US data owners. If belong to the latter group, you can probably ignore this field entirely.

There are no relationships defined for Ethnicity.

Relevant data fields:

Identifying COVID-positive individuals requires a concept set. How should concept sets be handled in general?