OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
216 stars 75 forks source link

UCUM missing "U/L" and "kU/L" #814

Closed XcrigX closed 1 year ago

XcrigX commented 1 year ago

I am converting some Synthea-generated FHIR data to OMOP, and have discovered (so far) 2 missing UCUM values in the OMOP Concept download available from Athena: U/L kU/L

select from CONCEPT where vocabulary_id = 'UCUM' and concept_code = 'U/L' select from CONCEPT where vocabulary_id = 'UCUM' and concept_code = 'kU/L'

These show to be valid UCUM codes in: https://ucum.nlm.nih.gov/ucum-lhc/demo.html

don-torok commented 1 year ago
Concept id description UCUM value
8645 unit per liter [U]/L
8810 kilounit per liter 10*3.[U]/L
aostropolets commented 1 year ago

Love how fast Don is! Much appreciated.

XcrigX commented 1 year ago

Hmmm. That's good there are concepts for them, but doesn't solve the problem of mapping source data to OMOP. Perhaps UCUM is too variable for such mappings to be possible/reliable?

XcrigX commented 1 year ago

I'll also add that neither of those codes passes the UCUM checker here: https://ucum.nlm.nih.gov/ucum-lhc/demo.html I'm not a UCUM expert, so unsure of that's definitive or not, but..

XcrigX commented 1 year ago

A little more research - UCUM defines a grammar so is potentially infinite, but LOINC previously published a list of common UCUM codes which appears to now be curated by UCUM themselves - available here: https://github.com/ucum-org/ucum/tree/main/common-units (linked from LOINC here : https://loinc.org/usage/units/) This list contains 'U/L' and 'kU/L' - not the codes pointed out.

It makes sense to me if this is the list (or basis of the list) of discreet concepts in OMOP as presumably these will be be the values systems are most likely to actually have.

cgreich commented 1 year ago

Here is the deal:

U - is the standard unit of enzymatic activity, ie. catalyzing one μmol of substrate per minute. [U] - is an arbitrary unit, meaning, it has a different definition from the above.

The problem we have is that it depends on the measurement whether it should be

So, we really need both U and [U] depending on what we are measuring. But it's too much to ask the ETL wretch to figure that out, we would have to assign a UCUM to every Measurement. We actually have some plan. But till then let's just have one unit Unit [U].

The LOINC list unfortunately does not provide that, ie. it does not link LOINC codes to UCUM codes.

XcrigX commented 1 year ago

I'm slightly out of my depth on the topic , but I thought an arbitrary unit would use curly braces in UCUM? ex '{U}' or '{UNIT}' or simply '1'.

'[U]' doesn't appear to be valid UCUM, where whereas 'U' is as you rightly point out is an 'enzyme unit' - what I'm looking for.

FWIW, I am trying to map a value associated with LOINC code: https://loinc.org/6768-6 - LOINC lists the example unit as "U/L".

For now I'll have to map it to concept_id = 0 (no matching code) I suppose.

But it still makes sense to me that if LOINC/UCUM are saying "these are the common codes that are actually used" - that having those in OMOP/OHDSI would likely head off a lot of problems.

cgreich commented 1 year ago

@XcrigX:

Nope. According to the UCUM definition:

§12 curly braces: Curly braces may be used to enclose annotations that are often written in place of units or behind units but that do not have a proper meaning of a unit and do not change the meaning of a unit. §14 square brackets: Square brackets enclose suffixes of unit symbols that change the meaning of a unit stem.

For now I'll have to map it to concept_id = 0 (no matching code) I suppose.

Why? Use the ones we have. We will split them up in future, but I don't see a reason to not use the ones we have.

Alexdavv commented 1 year ago

U - is the standard unit of enzymatic activity, ie. catalyzing one μmol of substrate per minute.

I thought the regular U is something that you described here. It becomes an arbitrary U if there's no a good path to measure the μmol or time. And if laboratories agreed on and follow the approach they call it international U. And this is exactly what's used in most of the trivial tests.

But all 3 we put into the square braces that actually mean and change nothing.

The LOINC list unfortunately does not provide that, ie. it does not link LOINC codes to UCUM codes.

Well, they have top-something possible units assigned to each test.

For now I'll have to map it to concept_id = 0 (no matching code) I suppose.

You pick up exactly what Don mentioned:

Concept id | description | UCUM value -- | -- | -- 8645 | unit per liter | [U]/L 8810 | kilounit per liter | 10*3.[U]/L
XcrigX commented 1 year ago

I didn't think you could put arbitrary things in square brackets like you can in curly braces, but I could very easily be wrong.

Here's what I know.

cgreich commented 1 year ago

The short shrift is:

cgreich commented 1 year ago

@XcrigX:

Because Unit is not the same as Unit. The enzyme activity unit is not the same thing as the arbitrary Unit they invented for the tumor marker, which is not an enzyme and isn't measured as such. In other words: Regenstrief is half right and half wrong, violating the syntax they invented when Gunther was still there. NLM is just parroting what Regenstrief says. {mymadeupunit} is not a unit, the unit here is '1'.

Let me use the standard killer argument for discussions that are going on indefinitely: :-) Do you have a use case where this matters?

XcrigX commented 1 year ago

Here's the use-case. I want to map/convert (lots of) FHIR data to OMOP. I want to use a generic "code resolver" for all coded data that simply maps the FHIR Codesystem URL to an OMOP vocabulary and looks up the matching OMOP concept to fill in concept_id. (Side note: please add a column to Vocabulary to include the FHIR codesystem URL ;) )

Even though it's technically not, for practical purposes I want to treat UCUM the same way I treat LOINC, RxNorm, etc. - as a discreet vocabulary in which I can lookup codes in a generic way.

It appears I can't do that today without special logic or (at least some) custom mappings I'd have to come up with.

cgreich commented 1 year ago

Got it. Sounds like a very good idea.

But you will inevitably run into issues and you will need heuristics, like trying out [U] if you can't find U. Because these definitions are flimsy at best. Why is it [IU] and U according to Regenstrief? Both have some definition based on canonical units.

You will need heuristics for the other vocabularies as well, for a variety of reasons, among them different versions, reuse of codes, butchering with code syntax (e.g. removing the dots from ICD codes) etc. But the idea is nice.