Open PRijnbeek opened 3 years ago
(I am a representative of the data partner in question) Here is a link to the list of "ETHNIC_CATEGORY" values supported by NHS Digital data in the UK: https://datadictionary.nhs.uk/data_elements/ethnic_category.html
These seem to be fairly standard, so other UK data sets are likely to include these too.
Most of the descriptions map to the existing OMOP race concepts in a straightforward manner, so they shouldn't be a problem. However, we would like to be able to stratify our analysis to separate out mixed race patients to determine whether their Covid19 and associated vaccine/drug/immunity data behaves differently from the total population.
Friends:
I would strongly suggest not to go into all sorts of details and especially the arithmetic with race fractions. They are a mixture of biological and social attributes, internationally not harmonizable and there are no definitions other than "self-assigned". Any discussion is very ideological. We discussed this at length in the Forum. I was waiting for the subject to calm down to propose the final solution, which will have to be ultra simple.
Hi Christian. I have already began to see what you mean about how poorly standardised race data can be at source. The second dataset I looked at contained the value of "Indian, Pakistani, Bangladeshi or other South Asian ethnic group", which is clearly an aggregate of several other races; this is the opposite problem where the race specification is not specific enough, rather than too specific when it comes to the mixed races I originally spoke about.
Perhaps we can make use of the race_source_value
field when it comes to running analysis against these fields and decide how to aggregate them at that point.
Alternatively, there is an observation_concept_id
for "race", and the supported value_as_concept_id
values for this are much more specific than those which go into the race_concept_id
field on person
. We might be able to get something working using these.
Perhaps we can make use of the race_source_value field when it comes to running analysis against these fields and decide how to aggregate them at that point.
We could, but then what's the point of the CDM and the standardization?
Alternatively, there is an observation_concept_id for "race", and the supported value_as_concept_id values for this are much more specific than those which go into the race_concept_id field on person. We might be able to get something working using these.
That's even worse. We don't need to "cheat" the system. If we can't standardize something because it essentially is not defined by medical criteria we shouldn't use it. If people want to do sociological studies on race - all for it, but not as part of clinical patient data.
Hi @cgreich, I see your point about not cheating the system. However, we are currently facing a similar challenge as @PRijnbeek where we need to store race, ethnicity and ancestry values. Is there a way we could form a working group to discuss this issue? Right now, the NHGRI GWAS catalogue seems like a good ontology to use when it comes to genetic ancestry (https://www.ebi.ac.uk/ols/ontologies/hancestro). We could potentially expand the definitions to self-identified or assigned race-ethnicity categories.
The categories mentioned by @shorban-uod are probably from the UK census, and in many observational studies that is the standard (alongside with the US census categories). @cgreich should we reach out to other consortium who are also facing this issue? I think we could benefit from their expertise.
@arturolp:
The problem is that these categories are not defined by objective criteria, but are relative to a societal context and self-identified. Which means it is impossible to do that on a global level. I am increasingly leaning to the conclusion we should only keep the 5 standard races, which are not attached to ethnicities, and everything else is local. The ethnicities we may fare better, but I haven't found a good hierarchy, yet. We should bring it up in the next (first) Vocab WG. @mik-ohdsi?
We have a data partner in EHDEN that has the following race information they like to use for COVID research:
Mixed race - White and Black Caribbean Mixed race - White and Black African Mixed race - White and Asian Mixed race - Any other mixed background
Currently these are not part of the race vocabulary. Is there a process to get these added? I think this is an OMOP vocabulary correct so would not be a problem to add?
By the way we noticed that there is a recent addition https://athena.ohdsi.org/search-terms/terms/35827397 but that is in the observation domain?
Any suggestions? Thanks