IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
193 stars 79 forks source link

Is-a relationship in ECL-searches #343

Open Tannjorn opened 2 years ago

Tannjorn commented 2 years ago

My use-case is to find an ancestor of a concept, where the ancestor has specific attributes. More precisely I want to find the top-level Medicinal Product Form for a Clinical Drug, regardless of whether the Clinical Drug has an is-a relationship to another Clinical Drug or if the Medicinal Product Forms are multi-level.

Firstly, it is simple, query all ancestors of the clinical drug (using an example that has hierarchical MPFs):

788516004 |Product containing precisely onabotulinumtoxinA 100 unit/1 vial lyophilized powder for conventional release solution for injection (clinical drug)|

This gives 17 concepts. Then filter for MPFs.

This ECL will only give MPFs in the closed world view, as it put constraints that uses the defining attributes of an MPF - closed world.

<<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = *

Combining these should look like this:

788516004 |Product containing precisely onabotulinumtoxinA 100 unit/1 vial lyophilized powder for conventional release solution for injection (clinical drug)| AND (<<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = * )

For this CD, the query gives two MPFs that are placed hierarchially to eachother.

Now I would like to add an attribute that gives me the MPF that has an is-a relationship to a Medicinal Product. MPFs that are first level decendants of an MP should be defined by adding an is-a relationship to a MP. First the defining characteristics of a MP:

<<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , [0..0] 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = *

Then put this into an ECL to restrict MPF to the ones that have an is-a relationship to a MP.

<<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = , 116680003 |Is a| = ( <<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , [0..0] 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = )

But this gives 0 results.

Any ideas?

pgwilliams commented 2 years ago

Hi @Tannjorn you said you're trying to find the ancestor for a specific concept, so I would find it easier to start with the specific CD you're interested in, and then work upwards, excluding attributes that are used with more specific levels of abstraction and including the ones that are needed for MPF. So I came up with this:

>> 1163177000 |Product containing precisely onabotulinumtoxinA 100 unit/1 vial powder for conventional release solution for injection (clinical drug)| : 411116001 |Has manufactured dose form| = , [0..0] 733725009 |Has concentration strength numerator unit (attribute)| = , [0..0] 732945000 |Has presentation strength numerator unit (attribute)| = , [0..0] 1142139005 |Count of base of active ingredient| = , 127489000 |Has active ingredient (attribute)| = *

And in this I'd make the following notes:

Does that help?

pgwilliams commented 2 years ago

Oh, but I need to think about multi-ingredient products because you're going to see an MPF with all the ingredients in combination as well as MPF for each ingredient individually. You said you wanted the highest level, so that would be the single ingredient MPFs. But I still want to avoid the "Only" forms, so I'll restrict the ingredient count using cardinality, rather than the 1142139005 |Count of base of active ingredient|

>> 413586006 |Product containing precisely aspirin 250 milligram and caffeine 65 milligram and paracetamol 250 milligram/1 each conventional release oral tablet (clinical drug)| : 411116001 |Has manufactured dose form| = , [0..0] 733725009 |Has concentration strength numerator unit (attribute)| = , [0..0] 732945000 |Has presentation strength numerator unit (attribute)| = , [0..0] 1142139005 |Count of base of active ingredient| = , [1..1] 127489000 |Has active ingredient (attribute)| = *

And here I'll expect to receive one MPF concept for each ingredient.

Tannjorn commented 2 years ago

Hi @pgwilliams , thanks! Creating this as a restriction of ancestors is of course a better approach han my conjunction.

But I see I was a bit unclear in my use-case description. I actually want the "only" form. And I want to make sure I get the "top-level" MPF-only. In some cases in Snomed a CD will have several MPF-only ancestors. Illustrated by this:

image

I want to show the green MPF-only as an ancestor for all CDs.

Working with your input, I have created this:

788516004 |Product containing precisely onabotulinumtoxinA 100 unit/1 vial lyophilized powder for conventional release solution for injection (clinical drug)|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = *

(I could have excluded more attributes, but as BOSS is a necessary condition for all CDs, this is probably sufficient to restrict the search).

But this query still includes both MPF-onlys. And only want to get the one with a is-a relationship to the MP-only. Introducing the is-a constraint in the ECL gives 0 results:

788516004 |Product containing precisely onabotulinumtoxinA 100 unit/1 vial lyophilized powder for conventional release solution for injection (clinical drug)|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = , 116680003` |Is a| = ( <<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , [0..0] 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = )

The target of the is-a is restricted to being a MP-only. But still, the is-a does not seem to be a good attribute in an ECL.

So, back to conjunctions. I tried several ways of restricting this, and I finally found a way. Putting in a conjunction removing all concepts that have another MPF-only ancestor - and thereby leaving the one with with only a MP-only is-a relationship. I looks like this - and is very slow on Snowstorm:

(>788516004 |Product containing precisely onabotulinumtoxinA 100 unit/1 vial lyophilized powder for conventional release solution for injection (clinical drug)|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = *) MINUS

(<<763158003 |Medicinal product|: 1142139005 |Count of base of active ingredient| = , 411116001 |Has manufactured dose form| = , [0..0] 732943007 |Has BoSS| = *)

This one actually does what I want, but it is not very "optimal".