daveklein / snofyre

Automatically exported from code.google.com/p/snofyre
0 stars 0 forks source link

Behaviour of Snofyre with inactive ConceptIDs #44

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm attempting to use Snofyre as a proof-of-principle demonstrator of an 
architecture to cope with concept inactivation, where inactive conceptIDs may 
be found in EITHER the EPR data being queried OR in the query specifications 
being run over that data. Demonstrating the solution requires swapping 
different versions of the SNOMED core tables, and particularly the TC table, 
around under Snofyre....

My current ‘test’ case runs as follows:

Query1 = Patients who have 38103000 Atopic rhinitis
Query2 = Patients who have 70076002 Rhinitis

..and these queries are to be executed (or used to generate data) under one of 
three different SNOMED configurations:

Config ‘OLD’:
Concept, relationship and description tables = October 2010 data
Transitive Closure Table = October 2010 data

CONFIG ‘NEW’
Concept, relationship and description tables = April 2011 data
Transitive Closure Table = April 2011 data

CONFIG ‘AUG’
Concept, relationship and description tables = April 2011 data
Transitive Closure Table = April 2011 data augmented with historical 
substitutions

The sequence of events that demonstrates the problem is as follows:

1. Delete all records and purge the TC table and expression library

2. Using Config=OLD (in which concepts table says Atopic Rhinitis is current 
and TC table says Atopic rhinitis subtypeOf Rhinitis)
Generate 100 patients using Query1
Run Query1: returns 100 patients
Run Query2: returns 100 patients

3. Purge the TC table but not the expression library
Swap to Config:NEW (in which concepts table says Atopic Rhinitis is Duplicate 
and so tc table says Atopic rhinitis NOT subtypeOf Rhinitis)
Run Query2: returns zero patients (as expected)

4. Swap back to Config:OLD
Run Query2: returns 100 patients (interesting; Snofyre appears to have sucked 
in the TC table without the expression TC table being purged first…)

5. Swap back to Config:NEW
Run Query2: returns 100 patients (not same result as before; assume expression 
TC table has to be purged to flush old results?)
Purge the TC table but not the expression library
Re-Run Query2: returns zero patients (as expected)
Re-Run Query2: returns zero patients (as expected)

6. Swap back to Config:OLD
Run Query2: returns 100 patients (so expression TC table is extended when new 
TC results are encountered, but never trimmed if existing rows become invalid 
against external main TC table?)

So, at this point everything is going well because I can swap back and forward 
between OLD and NEW and, provided I purge the TC table at each swap, I always 
get the answer I expect for the particular TC table sitting under SnoFyre: 
either zero, or 100 patients.

Where things go strange is if I then do:

7. Purge the TC table but not the expression library
Swap to Config:AUG  (in which Atopic rhinitis is an inactive duplicate, but 
despite this the TC table says it IS a subtype of Rhinitis)
Re-Run Query2: returns zero patients (NOT as expected)

It looks to me as though the sct_concept table is still being used first to 
test each instance of 38103000 that’s in the EPR and, finding that its an 
inactive concept (which it is, according to the sct_concept table in situ in 
either NEW or AUG) it then tries to substitute it. But, if it did do that, it  
*should* end up with 38103000 being turned into 61582004, which is still a 
subtype of the Rhinitis query concept, so I don’t understand why I still get 
zero patients!

Whatever, it looks as though some part of some Snofyre pipelines includes a 
step to logically replace any inactive ConceptIDs encountered with an active 
one. 

Could this be optionally switched off? 

Original issue reported on code.google.com by jeremy.r...@googlemail.com on 24 Aug 2011 at 9:39