OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
216 stars 75 forks source link

Afghanistan is not available in Race Domain #534

Closed usmanafzal89 closed 2 years ago

usmanafzal89 commented 3 years ago

Hi The Country Afghanistan is not available under RACE vocabulary/domain. Although we can use (8515 - Asian), Better to add new value Afghani under race vocabulary/domain.

cgreich commented 3 years ago

@usmanafzal89:

These ethnicities are not geographically determined, but "racially" (whatever that means). So, your Afghani would be White, and "Middle Eastern or North African". We don't have any further granularity in the vocabulary.

What's your use case? Do you want to study biological effects of an ethnicity, or politically/cultural? This stuff is horrible to standardize. The Afghani would probably define themselves by tribe (Pashto, Tadjik, Uzbek, etc.).

usmanafzal89 commented 3 years ago

@cgreich we are converting our patient data to omop cdm for a Covid- Specific Study. We have most of the patients from Pakistan and Afghanistan. For Pakistan we find the RACE_CONCEPT_ID (38003589 - Pakistani) in the Race Vocabulary but nothing for Afghanistan.

usmanafzal89 commented 3 years ago

As per the the OMOP CDM documentation link for accepted race_concept_id is https://athena.ohdsi.org/search-terms/terms?domain=Race&standardConcept=Standard&page=1&pageSize=15&query=

cgreich commented 3 years ago

I am increasingly coming to the conclusion that making ethnicities hierarchical descendants of races is impossible to maintain. We should have some basic racial concepts (maybe black, white, Asian (racial, not geographical), american native, maybe a few more). No mixtures of those. And have a list of ethnicities based on today's countries. And that's it.

I'll start another debate in the Forum. But whatever we do it will be ugly and break for some use cases. I cannot help that. I wish we were all the same, irrespective of some random disequilibria in our genomes.

Urgh.

cgreich commented 3 years ago

In your example, you have the ethnicity Pashtun, which is both at home in Afghanistan and in Pakistan. And then you have Punjabi, Sindhi, Saraiki, Muhajir, Balooch etc. And in Afghanistan you got Tadjik and Uzbek etc. This is not a solvable problem, @usmanafzal89. I would suggest you either add another field on the side, or you use the location table, or you create two different databases for patients from Afghanistan and Pakistan.

mik-ohdsi commented 2 years ago

@usmanafzal89 , with your permission I will close this issue now, after Christian Reich has provided his explanation.