IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
204 stars 80 forks source link

Missing concept parents in branch created for extension #43

Closed dkincaid closed 5 years ago

dkincaid commented 5 years ago

I've loaded the veterinary extensions and while searching for concept parents for our species value set I have run into a concept that does not return any parents when I use the branch created for the extension (MAIN/SNOMED-VET) with the findConceptParents endpoint.

The concept is 81260002. When I call the findConceptParents endpoint using the branch MAIN/SNOMED-VET no parents are returned (the list returned is empty). If I specify MAIN as the branch then the parents are returned.

I tried the findBrowserConcept with this concept id and the MAIN/SNOMED-VET branch and the returned value does include that parents in the relationships array.

kaicode commented 5 years ago

Hi @dkincaid. This is a content issue. The veterinary extension has made the only stated relationship of concept 81260002 inactive:

$ grep '\t81260002\t' SnomedCT_Release_VTS1000009/Snapshot/Terminology/sct2_Relationship_Snapshot_VTS_20190401.txt 
739111000009126 20160130T000000Z    0   332351000009108 81260002    321351000009104 0   116680003   900000000000010007  900000000000451002

You should still get a parent in the inferred form. It sounds like you have the International Edition on the MAIN branch. The International Edition has been properly quality tested so every concept has at least one stated and inferred parent apart from the root concept which has none.

dkincaid commented 5 years ago

Interesting and frustrating. Thanks again for the check.

Can you help me understand what a branch is in snowstorm? I didn't see an explanation of it in the docs and I just followed the instructions for installing the International version and for installing an extension. For this particular example, the International version has the parent of that concept active (I can see it in the IHTSDO SNOMED browser as active), but what you found is that the Veterinary extension has the relationship inactive. So is the inactive relationship from the extension overwriting the active relationship from the main SNOMED version?

dkincaid commented 5 years ago

I looked in the release files for the SNOMED International version:

6412388027  20160131    1   900000000000207008  81260002    321351000009104 0   116680003   900000000000011006  900000000000451002

it has this same relationship, but it is active and the effective time is 20160131. The effective time in the Veterinary Extension for the inactive relationship is 20160130. So shouldn't the relationship with the later effective time take precedence?

kaicode commented 5 years ago

Branches allow multiple editions and versions of SNOMED CT to be stored in a single terminology server. When things are set up correctly the code system API endpoints can be used to list these. Each SNOMED CT Edition or Extension which has been imported is represented as a code system. For each code system there may be one or many versions depending on how many releases of that extension have been imported. Each code system version exists on a different branch to allow clean separation of the content. Typically the International Edition lives on MAIN then an extension can be imported onto a child branch. The child branch will be able to see all the content on MAIN plus whatever content has been imported into that branch .. for example an extension. So using the MAIN branch will let you query the International Edition and using your MAIN/SNOMED-VET branch will let you query the vet extension content.

dkincaid commented 5 years ago

Excellent explanation. I get it now.

I still have a question about what should happen in this example where the main version and the extension have the same relationship with different effective times. In this case the veterinary extension has a relationship as inactive with effective time of 20160130 and the International version has the same relationship active with effective time of 20160131. It appears that the branch for the extension reflects the inactive status even though the active effective time from the International version is later.

kaicode commented 5 years ago

When importing RF2 Snowstorm checks that the components being imported do not already exist in the current branch (or parent branches) at the same or greater effectiveTime. If you import a relationship into an extension branch which already exists on MAIN you can expect log output during the import, something like: INFO - 3 Relationship components in the RF2 import with effectiveTime 20160131 will not be imported because components already exist with the same identifier at the same or later effectiveTime.

For this to work the International Edition would have to be imported first and standard format effectiveTimes must be used. If this is not working please raise a separate issue :)

dkincaid commented 5 years ago

Ok, I will run through the import again and look for that. Is there anything else I can look at to help troubleshoot? Right now, it appears that something is amiss since I have two codes with this issue which do not return any parents from the findConceptParents endpoint. The relationships do exist in the relationships array returned from findBrowserConcept however. This is when using the MAIN/SNOMED-VET branch.

I am importing International version first and I have modified all the effective times in the veterinary extension RF2 files to be YYYYMMDD format (stripping the times).

dkincaid commented 5 years ago

I am getting quite a few of these warnings in the log during imports of both International version and Veterinary extension. Is this anything to be concerned about?

2019-04-30 10:48:01.217  WARN 28206 --- [/O dispatcher 2] org.elasticsearch.client.RestClient      : request [GET http://localhost:9200/es-query/query-concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: 
[299 Elasticsearch-6.4.2-04711c2 "Deprecated: the number of terms [393673] used in the Terms Query 
request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the 
[index.max_terms_count] index level setting." "Tue, 30 Apr 2019 15:47:57 GMT"]
kaicode commented 5 years ago

The default form for the findConceptParents endpoint is inferred. If you are searching for inferred parents there will have to be active is-a type INFERRED relationships in the concept returned by findBrowserConcept.

kaicode commented 5 years ago

No need to worry about those warnings. That issue is being tracked under #12 and will be fixed before we upgrade to Elasticsearch 7.x - probably in the next year.

dkincaid commented 5 years ago

Thanks again. I'll create a new issue since the problem still exists after clearing and reimporting everything. And I did not see any log message like INFO - 3 Relationship components in the RF2 import with effectiveTime 20160131 will not be imported because components already exist with the same identifier at the same or later effectiveTime. during the import.

kaicode commented 5 years ago

That would be helpful. Thanks.