biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
114 stars 49 forks source link

ICD9/ICD10: can we determine a good canonical URI prefix? #251

Open matentzn opened 2 years ago

matentzn commented 2 years ago

We were wondering what would be a suitable canonical URI prefix for ICD-9 codes. Does anyone have any suggestions/opinions on that? @dhimmel @cthoyt

I guess the main tradeoff to consider here are:

is https://icd.codes/... suitable?


See also https://github.com/biopragmatics/bioregistry/issues/256

sierra-moxon commented 2 years ago

I took an ICD9 dbxref from mondo: ICD9:784.7 and tried to resolve it using the links above, and I can't. I can however find a page that redirects ICD9:784 to the ICD10 equivalent on this page: https://icd.codes/icd9cm/784 -- and then in the synonyms list for ICD9:784 I see 784.7 which is linked like this: https://icd.codes/icd9cm/7847

sierra-moxon commented 2 years ago

I also have some references to ICD0 (but can't find a resolvable identifier for that).

dhimmel commented 2 years ago

For the record, there currently isn't any revolvers for ICD9 in the Bioregistry as per https://bioregistry.io/reference/icd9:277.9:

image

For ICD10, the Bioregistry uses resolution URLs like http://apps.who.int/classifications/icd10/browse/2010/en#/C34. This URL is by the World Health Organization, which creates ICD codes. But the WHO doesn't provide a browser for ICD9 codes, right? And that is why we need to pick a secondary provider?

https://icd.codes looks good. Would prefer if its source code was open. Probably better to put something to resolve ICD9 codes than nothing.

joeflack4 commented 2 years ago

I can't remember if it was ICD10 or ICD11, but Chris Chute was explaining to me that there is a base "ICD", which I think might be a polyhierarchy, and then they go through "linearization" to make mono-hierarchies ICDxxCM and ICDxxPS. If I find the GitHub thread I'll link it. I also don't know if this applies to just ICD10, just ICD11, or both ICD10 and ICD11. And I don't know if this also applies to pre-ICD10.

matentzn commented 2 years ago

I would like to propose to use ICD10WHO, ICD10CM, ICD11 as canonical prefixes (or if it must be their lower case counterparts), to avoid the nightmare issue @joeflack4 and some of our team just had to sort through where and occurrences of ICD10 could not be attributed clearly to the ICD10WHO or ICD10CM variants. It is important to note that the codes between the two differ: even the same exact code can refer to a different thing. ICD10 as a prefix invites too much speculation and just caused us two developer work weeks..

As for resolution, I am starting to think that in these kinds of cases, we could potentially use bioregistry; unfortunately though, the : in the URI could cause some problems for some parsers, but we could take a look.

Someone should lobby the ICD team like Chris Chute or someone with standing to get them to maintain a PURL system..

cthoyt commented 2 years ago

Reminder that there's an ICD collection in the bioregistry - https://bioregistry.io/collection/0000004. Happy to accept any actionable suggestions (but note we'll need more details than what's in the previous comment to make any changes)

matentzn commented 2 years ago

Super cool. What is your take on my argument though? icd10who: instead of icd10: to avoid the kind of mishap we just went through in Mondo?

cthoyt commented 2 years ago

I still don't understand what the difference is between icd10who, icd10cm, and icd10. Can you give me examples of local identifiers for each, the regular expression describing identifiers for each, and a URL to resolve them?

joeflack4 commented 2 years ago

@cthoyt I saved this answer from Chris Chute, my boss, who worked on ICD11 and has some deep knowledge on previous versions of ICD as well:

Q (Joe Flack): Thanks, Chris! I checked bioportal, but I only see "ICD10", "ICD10CM", and "ICD10PCS". And the browser you linked, though hosted by WHO, only says "ICD10". Nico and I might be confused as to what "ICD10WHO" is exactly.

A (Chris Chute): Ah, that is easy. ICD10, not otherwise specified, is WHO ICD10. It is the root form, created by WHO. All other versions, such as CM, are modification of the base, in CM’s case it stands for Clinical Modification but is really the US version. There are:

  • ICD10AM – Australia
  • ICD10CA – Canada
  • ICD10GM – Germany

And a few more. They are ALL based on ICD10, which is one and the same with the WHO version.


FYI, for ICD11, things are a little different. From what I understand, what is called plainly "ICD11" with no suffix is a polyhierarchy, and is not meant to be used as a code system, but to instantiate mono-hierarchy code systems. And different countries / variations of ICD11 go through a process called "linearization" (you can see it mentioned in the WHO ICD API interface) to create the instantiated version. It doesn't look like there's an "ICD11WHO".

matentzn commented 2 years ago
prefix alt prefix example id example expansion
icd10who ICD10WHO H00-H06 https://icd.who.int/browse10/2010/en#/H00-H06
icd10cm ICD10CM H00-H59 https://www.icd10data.com/ICD10CM/Codes/H00-H59

Regexes: https://www.johndcook.com/blog/2019/05/05/regex_icd_codes/

IMO there should not be a prefix for ICD10 - it's a total madness. If we have to have it, it should be an alternative prefix for icd10who.

joeflack4 commented 2 years ago

We've been using ICD10WHO for awhile now and it's working great for our use case!

cthoyt commented 2 years ago

I'm glad that's working for you, but like I mentioned, not really an actionable comment for the Bioregistry. If someone really wants to push meaningful discussion forward, they can fill out a new prefix request form where we can have pointed discussion about if the prefix overlaps with something that exists or not, what the URI format string is, etc.

https://github.com/biopragmatics/bioregistry/issues/new?assignees=biopragmatics%2Fbioregistry-reviewers&labels=New%2CPrefix&template=new-prefix.yml&title=Add+prefix+ICD10WHO

joeflack4 commented 2 years ago

Done #485 ! IDK how important it is to support; I don't really use bioregistry that often. But I probably will in the future, and I imagine I would be happier if this was done.

dhimmel commented 1 year ago

To summarize where we are at with representing ICD-10 variants:

There's a problem where using icd10 instead of icd10who as the canonical prefix hides the distinction, such that most users will blissfully continue using icd10 identifiers unaware of the potential distinction. It sounds like icd10cm codes are actually more prevalent than icd10who codes in the wild. I tend to agree with @matentzn that we should bring the distinction to the forefront by making icd10who the canonical form.

I'd like to know what percent of icd10who codes are actual icd10cm codes? Have most users been able to ignore the distinction because icd10cm is mostly just a superset of icd10who?

References:

joeflack4 commented 1 year ago

This summary sounds pretty good to me.

I'd like to know what percent of icd10who codes are actual icd10cm codes? Have most users been able to ignore the distinction because icd10cm is mostly just a superset of icd10who?

@matentzn Do you know the answer to this? I feel like we/I may have done an analysis that might answer this question, but it's been awhile. I feel like it could be 5-10% of codes that appear ICD10CM that don't appear in ICD10WHO and vice-versa, but I could be totally wrong.

I'd be happy to do this analysis, but I'm pretty backed up at the moment.

matentzn commented 1 year ago

The question is the wrong way around (who is the original, cm is a derived product), but to answer it is not trivial, trivial. You would have to handle:

@joeflack4 I share your intuition about prevalence! But as you, I don't have numbers.

dhimmel commented 1 year ago

Thanks @joeflack4 and @matentzn for the thoughts. My main goal is to identify the proper way ICD codes should be encoded as CURIEs in resources that have ICD cross-references. If a resource is aware of whether it's using the WHO or CM variant, it should probably just apply the preferred prefix: icd10/ic10who or icd10cm. If it's unaware of which variant is being referenced, then perhaps an automated analysis should check for term existence in ICD10WHO and ICD10CM. There is then the hard question of whether resources should provide mappings to both ICD10WHO and ICD10CM when a term is shared across the two variants. Is it worth duplicating xrefs to make this distinction more explicit?

matentzn commented 1 year ago

Is it worth duplicating xrefs to make this distinction more explicit?

I think it is worth it.

We need a way for resources to link the context used for the cross reference prefix interpretation. But even without that, I would say ICD10: prefix should be discouraged moving forward.

dhimmel commented 1 year ago

I would say ICD10: prefix should be discouraged moving forward.

Well the obvious way to do that is to make ICD10WHO the preferred prefix and keep ICD10 a synonym prefix. I'm +1 to this, noting that it will cause major downstream changes for users that abide by Bioregistry preferred prefixes (probably very few). But it will begin the ball rolling on the reckoning: where resources will start to have to handle the multiple variants of ICD explicitly and might even start losing mappings as some resources use CM versus WHO.

We need a way for resources to link the context used for the cross reference prefix interpretation

I don't understand what you mean here?

matentzn commented 1 year ago

I think having bioregistry documenting preferred vs not is not enough practically, despite it being a great aspiration. We should encourage resources to actively link to a context like https://github.com/biopragmatics/bioregistry/blob/main/exports/contexts/obo.context.jsonld to say: This is the prefixmap to use for interpreting my data. This way, there is no ambiguity for the ICD10 case as well.

cgchute commented 1 year ago

Just left a meeting where HL7 is struggling with the same thing. Davera would now more. Chris