IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
209 stars 84 forks source link

Classification of postcoordinated expression #122

Open mertenssander opened 4 years ago

mertenssander commented 4 years ago

Hi all,

I am unsure if this is the right project for this question, but cannot find a better location. We are working with several parties in the Netherlands that want to increase their use of postcoordination - for obvious reasons. We are looking for a solution that allows users to submit a postcoordinated expression, after which I imagine a classifier would run and the ancestors/parents and children/descendants would be returned - or an equivalent concept if it exists. Do you provide tooling for this, or are you aware of solutions that can help us in this? The MRCM endpoint of SNOMED can help us validate the expression, but the practical value of investing in postcoordination is dependent on this tooling.

If I can provide a more thorough explaination of the required functionality; please let me know.

Thanks in advance! Best regards, Sander

danka74 commented 4 years ago

Hi Sander, did some work on this ages ago based on OWL API and OWL representation of SNOMED CT snapshots, so it's doable but it was kind of a hack. By chance I got a question from a user and we investigated if snowstorm code could be used/amended/etc. to achieve this. I ran out of time :(, but this test case would provide useful information for anyone interested: https://github.com/IHTSDO/snowstorm/blob/123f6872c55b2527dd7b298a28b57a2fb7e062a1/src/test/java/org/snomed/snowstorm/core/data/services/classification/ClassificationServiceTest.java Cheers, Daniel

bcarlsenca commented 4 years ago

Just for an extra bump, i've also been long interested in this - and have talked with Kai about it from time to time. Thanks Daniel for the link i'll try to take some time to look into that.

Brian

On Wed, May 27, 2020 at 8:06 AM Daniel Karlsson notifications@github.com wrote:

Hi Sander, did some work on this ages ago based on OWL API and OWL representation of SNOMED CT snapshots, so it's doable but it was kind of a hack. By chance I got a question from a user and we investigated if snowstorm code could be used/amended/etc. to achieve this. I ran out of time :(, but this test case would provide useful information for anyone interested: https://github.com/IHTSDO/snowstorm/blob/123f6872c55b2527dd7b298a28b57a2fb7e062a1/src/test/java/org/snomed/snowstorm/core/data/services/classification/ClassificationServiceTest.java Cheers, Daniel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IHTSDO/snowstorm/issues/122#issuecomment-634725613, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZGBQQMJNOISVFLJENYONDRTUUBBANCNFSM4NMFKXWQ .

--

Brian Carlsen West Coast Informatics, LLC www.westcoastinformatics.com https://www.linkedin.com/in/bcarlsenca/

kaicode commented 4 years ago

It is on our roadmap to produce tooling for an expression repository this year based on Snowstorm with functionality for normalisation, classification, equivalent identification and ECL. We have an agreement to work with an organisation providing data analytics services for the NHS. They are manually creating SNOMED post-coordinated content as a target for mapping legacy codes into their platform.

Discussions are ongoing within SI regarding best practice in this area which will feed into the implementation. I have also reached out to a vendor who has some experience in this area to learn for their experience. I am starting to discover that an Expression Repository means different things to different people but classification of expressions is a common theme.

@mertenssander yes, any information you can provide on known intended use case would be very helpful.

mertenssander commented 4 years ago

Hi @kaicode , Thanks for your reply! I have a meeting with the group working on this later this week. I will make a point of defining the usecase, and get back to you. Shall I reply on this issue, or do you have an alternative channel where I should contact you?

Best regards, Sander

kaicode commented 4 years ago

Let's keep it public as there are few interested parties. Here is good for me. In the coming weeks we will create an area in Snomed International confluence for post-coordination implementation requirements and discussion.

dmarkwell commented 4 years ago

Just to add the revised DRAFT version of the SNOMED TSG currently mentions this requirement. So it would be great if Snowstorm could address it. However, I am aware there are some complications to getting this right.

The tricky part is the first step is to ensure the expression is valid when checked against the concept model. In some cases, minor discrepancies can be to resolved (e.g. laterality applied to a finding or procedure ... rather than to an anatomical structure). However, in other cases a refinement that is permitted by the concept model may still result in ambiguity. Here are a few examples that have been considered in the past ...

Easiest cases:

A little more difficult but soluble:

Example of a difficult case:

BTW: The draft revision of the Terminology Services Guide mentioned above is something I am contributing to in a contractor role at the moment. It is not yet publicly available.

mertenssander commented 4 years ago

As noted earlier, we have several parties interested in using postcoordination for their registration. I have read a study where clinicians could postcoordinate a concept if there was no usable existing concept, after which the generated expression would be sent to a terminologist for authoring. We believe this would still create an immense load on the terminologists, and limit the practical application of postcoordination in smaller centers or even general practicioners.

Allowing third parties to use postcoordination (guided with templates and a usable interface) would greatly lighten the load on our terminologists, as we do not have the manpower to create every requested concept within a practical deadline. This is a possible downside to using SNOMED over a local codesystem. I imagine other countries have even fewer trained staff on hand. Postcoordination seems the way to go, but has some caveats.

I have tried to summarize some of the larger questions I have received in this area: • What is the vision of SI in the case of a recipient of a postcoordinated expression? Should it be classified? What would be the term in this case? As a broader question: what is the vision of SI for postcoordination? • If a diagnosis or imaging is recorded using a postcoordinated expression, what would be the value of the expression / how would/should it be used? It would not be possible to perform analytics using ECL queries on this expression, as snowstorm would not know of its existence. • When a postcoordinated expression is recorded in a patient file, one usecase could be checking for specific ancestors. Ie. ‘if this expression/equivalent concept has parent X, advice treatment Y’. Checking for ancestors of a postcoordinated expression is, as far as I know, not supported by snowstorm. • Checking for equivalency: Rewriting the postcoordinated expression to an ECL query could lead to 1 concept, which could be equivalent. A separate endpoint specific to this, with additional failsafes would probably be needed. • I could imagine recording each postcoordinated expression as a new concept, with a new (local) ID, and using the expression as it’s terms. Subsequent equivalent postcoordinated expressions could then be changed to this local concept ID, in order to allow for analysis/aggregation and preventing duplicates. This could work, until you need to exchange the concept with a third party; you would then need the expression.

I hope you can provide us with answers to some of these questions. Please let me know if I can clarify anything! Best regards, Sander

kaicode commented 4 years ago

Thank you for your comprehensive questions @mertenssander. We are getting our heads together to provide a response so will be a few days.

kaicode commented 4 years ago

@mertenssander, In response to your questions:

What is the vision of SI in the case of a recipient of a postcoordinated expression? Should it be classified? What would be the term in this case? As a broader question: what is the vision of SI for postcoordination?

Yes, SNOMED International does envisage organisations being able to send and receive postcoordinated expressions. Expressions should be sent in the 'stated' form (ie using the focus concepts and attributes stated by the user), which should 'ideally' be created using a template that complies with the SNOMED concept model. The sender should also provide a 'term' with the expression - which could (a) be as simple as the expression itself with preferred terms (e.g. 182201002 |Hip joint| : 272741003 |Laterality| = 24028007 |Right|), (b) the expression itself with concept ids excluded (e.g. |Hip joint| : |Laterality| = |Right|), (c) a term created using an automated description template (e.g. [laterality] [body structure] → "Right hip joint"), (d) a user interface term that is mapped to the given expression (e.g. Right hip), or (e) a manually curated term. The sending system should decide how to safely create terms for their given use case.

Once the expression is received, there is no need to classify it if you are just displaying the record. However, if you need the content to be available for querying, decision support, or other types of analytics, then the recipient should:

  1. Validate the expression (including checking for syntax compliance, active concept references, and validity against the MRCM).
  2. Normalize the expression into an MRCM-compliant ‘proximal-primitive form’ (e.g. applying refinements to the role groups in the definition of the focus concept, where appropriate). (Optional Step)
  3. Translate into the respective OWL representation.
  4. Classify the OWL representation against the precoordinated content to find subsumption and equivalence relationships.

If a diagnosis or imaging is recorded using a postcoordinated expression, what would be the value of the expression / how would/should it be used? It would not be possible to perform analytics using ECL queries on this expression, as snowstorm would not know of its existence.

A terminology server should classify the expression, store it with an assigned identifier and allow ECL search. This will allow both precoordinated and postcoordinated content to be used in reporting and analytics tasks. SI plans to implement this functionality in Snowstorm

When a postcoordinated expression is recorded in a patient file, one use case could be checking for specific ancestors. Ie. 'if this expression/equivalent concept has parent X, advice treatment Y'. Checking for ancestors of a postcoordinated expression is, as far as I know, not supported by snowstorm.

You are right, this is not yet supported but will be by the end of the year.

Checking for equivalency: Rewriting the postcoordinated expression to an ECL query could lead to 1 concept, which could be equivalent. A separate endpoint specific to this, with additional failsafes would probably be needed.

I agree that there should be API endpoints dedicated to this functionality. Because OWL Property Chains and Transitive Properties are now present in the International Edition, using 'structural subsumption testing' over normalized expressions is no longer a valid way to test subsumption or equivalence. Instead the approach mentioned above should be used (i.e. to normalize the expression and classify using an OWL reasoner to find inferred parents and equivalent concepts). This functionality is in the immediate roadmap for Snowstorm. There has been some delay starting this work but it should be completed this year.

I could imagine recording each postcoordinated expression as a new concept, with a new (local) ID, and using the expression as it’s terms. Subsequent equivalent postcoordinated expressions could then be changed to this local concept ID, in order to allow for analysis/aggregation and preventing duplicates. This could work, until you need to exchange the concept with a third party; you would then need the expression.

Yes, assigning an identifier and auto-generated terms to the postcoordinated content and including it for analytics is in line with what we envisage. However, we would recommend assigning a different expression identifier for each substantively different expression, because equivalence of expressions can be different when tested against different versions of SNOMED. By giving each expression a different id, you can retain the "close-to-user" expression (i.e. the stated intent) and reclassify your expressions with each new version of SNOMED CT. Please note that it is not necessary to assign the same id to 'equivalent' expressions in order to do analysis/aggregation. Instead, the classifier should be able to classify all expressions in the appropriate place in the hierarchy, so that appropriate grouping and subsumption testing (e.g. as required by an ECL implementation) can be performed.

And yes, we agree that assigning identifiers to expressions for use within local patient records is a good idea. These identifiers can be used and shared with other systems that share the same expression repository (and therefore can look up the matching expression). As you suggest, when the data is exchanged with a third party (who does not have access to the expression repository), then the expression itself needs to be shared.

mertenssander commented 4 years ago

Hi @kaicode and colleagues - thank you for your response! It is very useful and interesting to hear your vision on this somewhat complicated subject. We will relay this to the interested parties we are in contact with. I'm sure we can continue our work using your reply. I think it's going to be a very involved process for vendors to implement and exchange these expressions, but with sufficient guidance from SI your envisioned approach seems very viable. We're very interested in seeing the planned enhancements to snowstorm! Best regards, Sander

lawley commented 4 years ago

Hi @kaicode Following this, I am wondering about whether and to what extent you plan to support close-to-user form?

Also, with respect to the <<< part of the grammar, two expressions <<<12345678 and <<<12345678 are (technically) not equivalent, so every time you map these to an (internal) identifier, it needs to be different. This creates a whole world of pragmatic issues (because sometimes you know/want them to be "the same"/"equal"). Have you attempted to deal with this issue?

kaicode commented 4 years ago

Hi @lawley, welcome. I see your point that two "subtype of" expressions should perhaps not be considered equal because in SNOMED CT there can be many primitive concepts with the same SubClassOf axiom which are not equivalent. In precoordinated authoring the description terms can be used to distinguish between a set of concepts with the same axiom expression so there is value there. However in a postcoordinated repository with no terms having multiple classes for the same expression may have no semantic valuable because nothing different can be done with the duplicates in terms of selecting expressions and therefore clinical records?

I suggest that a postcoordinated expression repository should allow terms to be recorded against expressions and that those terms should be used to determine if an submitted expression is new or existing. If the terms do not match exactly and no human is present to make the choice then the expression being submitted must be assumed to be new (not equivalent). If no terms are provided and there is an existing expression which also has no terms then I would consider these to be equivalent.

Open to further thoughts and ideas.

kaicode commented 4 years ago

Yes, we are looking at the options for supporting close-to-user form and the necessary transformation to authoring form. We are actively seeking examples and existing expressions that we can analyse to help support this work. Anyone, please contact me if you can help in this area.

dmarkwell commented 4 years ago

Hi @lawley and @kaicode , I also see the point that two "subtype of" cannot be assumed to be equivalent. However, I wonder how significant this is in most practical situations. For example, if we take all records that assert a diagnosis using the precoordinated expression: 75570004 | Viral pneumonia (disorder) | it is certainly true that the diagnosis does not assert the specific nature of the causative virus. So the conditions are not necessarily equivalent. In a query for all cases on pneumonia or all cases of viral pneumonia, the missing detail is unimportant. The reason the specific virus is not included in the diagnosis could be because it is unknown or was unknown at the time the diagnosis was recorded. The practical problem occurs when trying to analyze the data in a way that takes account of the specific causative agent. When trying to address that use case, the issue is not that 75570004 | Viral pneumonia (disorder) | includes non-equivalent conditions ... but that it is impossible to know how many of these cases were, in reality, equivalent to a specific type of viral pneumonia (e.g. 882784691000119100 | Pneumonia caused by severe acute respiratory syndrome coronavirus 2 (disorder) |). I think this example, can be applied more widely. Almost any clinical concept represents a range of different meanings at a finer level of detail and this is true whether (a) there are defined concepts representing those more specific meaning (b) there are attributes values that could enable that representation, or (c) there is no current way to represent the more specific meaning in SNOMED CT.

I am probably just failing to spot a practical case where this factor creates a more serious type of practical problem with post-coordinated expressions. So would be interested in understanding your specific concerns with this.

sschaat commented 1 year ago

Hi @kaicode and others,

We are currently examining the possibilities for querying FHIR resources that are coded with postcoordinated expressions. Hence, I am curious about the current state and plans for the mentioned capabilites in Snowstorm.

Best, Samer

kaicode commented 1 year ago

I am currently focused on producing a beta in the Snowstorm X project with postcoordination functionality using the FHIR API. The current target for this is the end of January. Features will include validation, transformation, incremental classification, persistence, subsumption testing and ECL queries for postcoordinated expressions. I will post here when the beta and the documentation is ready.

Snowstorm will remain as a terminology server containing only terminology content, including any added expressions. Two options for querying FHIR resources that are coded with expressions will be: