IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
209 stars 84 forks source link

ECL for healthcare records with historic (inactive) content #68

Closed ghost closed 2 years ago

ghost commented 5 years ago

When attempting a query with activeFilter=false and an ECL query, the following error is returned:

{"error":"BAD_REQUEST","message":"ECL search can not be used on inactive concepts."}

This appears to come from: https://github.com/IHTSDO/snowstorm/blob/601b10e31c2e46e0c2a3f3d7ff9a7432773f4f0e/src/main/java/org/snomed/snowstorm/rest/ConceptController.java#L91-L93

We are hoping to use SNOWSTORM to find the SNOMED concepts that match an ECL query, including inactive concepts. This is because we will use these concept ids to filter historic datasets, many of which will include concepts that are now inactive that we will still want to include.

danka74 commented 5 years ago

The ECL standard (http://snomed.org/ecl) is agnostic to what substrate an ECL expression is applied to, i.e. the substrate could consist of active and inactive components, although it would default to active components (and refset members) of the edition of SNOMED at hand (or at the path). The use case described above seems fully reasonable, so I'd support an improvement of snowstorm to allow ECL to be applied to a substrate with inactive components (given that this needs prioritization).

ghost commented 5 years ago

I wonder if this is an overzealous check or if the check is here to prevent calling unsupported behaviour.

Could somebody confirm If I were to simply delete the check on line 91-93 in ConceptController.java would I be able to reliably query inactive codes or is the substrate of active codes hard-coded currently?

pgwilliams commented 5 years ago

The issue here is that when an concept is made inactive, all of its stated and inferred attributes (now axioms in the stated view) get inactivated also. Inactive concepts stop being children of the parents they had when they were active, so something like << 363787002 |Observable entity (observable entity)| would not naturally include any inactive concepts. Because inactive concepts do not have parents, making them appear in an ECL would require us to - behind the scenes - temporarily re-instate the last known hierarchical positions. The check in the controller there is really to help avoid confusion of users asking for inactive concepts but not receiving them!

danka74 commented 5 years ago

Very good point @pgwilliams, I was not considering this basic fact :( ! I would assume that anyone needing to query over inactive content would need to query over the version of SNOMED CT at a time the concept was active to give reasonable results, as other things the concept's classification depends on might have changed as well. @nicholasudell, would querying for semantic tag of inactive concepts be a good-enough solution for your use case?

ghost commented 5 years ago

Our use-case is perhaps a little more complex (although I'd not have thought uncommon) than can be solved with semantic tags:

If a patient changes practice to one of our customers, our patient record extraction and reporting software extracts all SNOMED codes within a set definition (for example their vaccination history, or their flu history) that is provided to us as an ECL query (usually more complex than simply a refset or list of concept ids). As the majority of this data is historical, we need to also extract any codes that would historically match our ECL query at any point in time, including codes that are now inactive, so as not to miss a portion of the patients' history.

Currently we plan to generate a list of concept ids to extract by running the ECL query through Snowstorm, however without the inactive concepts we risk missing historic data. Instead, we could potentially run the ECL queries against each version of SNOMED (and then incrementally with each new version) by creating a branch for each release, then combine the list of codes to produce our extract list, however I'm unsure if this is the best way to go about it as, at the least, it would take a very long time to import all these releases, and likely require a very large database.

I would be surprised if the SNOMED spec didn't have some consideration supporting historical reporting like this, so I assume I'm missing something here.

danka74 commented 5 years ago

You're right, that is complex ;) I do not have experience as we're only starting to use SNOMED CT in live systems, we have yet to see this problem in the wild. We'll be there in a year or two so preparedness is important. What has been discussed previously (either in SLPG or SNOMED on FHIR, can't remember) is to provide mappings between SNOMED concepts in different versions, i.e. that this inactive concept corresponds to this (or these) active concept today. This works when there are "simple" history relationships (for duplicates mainly, but SAME AS and REPLACED BY) but less well for ambiguous concepts (1:* mappings, POSSIBLY EQUIVALENT TO) and not well at all for Non-conformance concepts. I guess there are limits to what can be achieved without manual review.

kaicode commented 5 years ago

Hi @nicholasudell,

I agree that this must be a common use case!

Currently Snowstorm removes concepts and relationships/attributes from the semantic index which is used to run ECL queries when they become inactive. This is because when they are marked inactive we consider them to no longer be true.

To get your use case working in the short term you would have to run the ECL against the release when each concept was last active. If you run an RF2 FULL import in Snowstorm all the required release branches will be created for you. More release branches can be added as releases are published using a DELTA import with the createRelease flag set to true.

Steps to run ECL for a set of concepts with an unknown active status:

In the short term I recommend creating that logic your side but in the next six months or so it would be nice to have a Snowstorm endpoint capable of doing all that in one request. I'll give this some thought.

I hope that helps a little.

Kind regards, Kai

danka74 commented 5 years ago

Interesting @kaicode , but this would only work under the condition that the concepts used in the ECL expressions are not themselves added at a stage after the first use of a would-be subsumed concept. Also, this will for an <<X query give you everything that was once believed to be a kind of X which might be what you want, or not. So there seem to be two strategies; either map all (i.e. all that can be mapped) data to the latest version OR collect all meanings over time, the pros and cons of which likely needs more elaboration. /Daniel

kaicode commented 5 years ago

Yes, this problem does have complexities doesn't it! The organisation of concepts, the attributes in use and the editorial guidance change over time which means that an ECL query which is works well now may not be effective on an older release.

To me using complete published points in time when everything you are interested in was published as active seems like the safest strategy. That may mean the ECL query needs to be tweaked before running against an older release. This is your "meanings over time" strategy I think. This can probably only be done manually on a case by case basis because of the lack of metadata around changes in editorial guidance. So maybe providing an automatic historical matching query feature would be misleading, or if it was produced it should be labeled with a warning.

I agree that this needs more thought, discussion and examples to work through.

danka74 commented 5 years ago

For reference, David Markwell did an analysis of some of the complexities involved in using history relationships: https://confluence.ihtsdotools.org/display/mag/Working+with+Historical+Associations

kaicode commented 2 years ago

We can finally close this ticket! With the introduction of ECL 2.0 and the History Supplement feature, which is implemented in the latest Snowstorm release (7.9.3).

ECL History Supplement Example All types of Asthma can be fetched, including inactive concepts that have an historic association of type of same-as, replaced-by, was-a or partially-equivalent-to, using the following ECL:

<< 195967001 |Asthma| {{ +HISTORY-MOD }}

We expect this will be widely used in the wild!