Closed dkincaid closed 5 years ago
The Snowstorm "semantic index" is an index of all concept parents, ancestors, attributes and attribute groups. This is used to answer the findConceptParents
call and other hierarchy and ECL queries.
Initial thoughts:
739111000009126
will have been imported because a relationship with that identifier does not exist in the International Edition. Checks are based on the component id here.There is a workaround for this scenario. Could you try rebuilding the semantic index please? This can be done using the rebuildBranchTransitiveClosure
function under the Concepts area of Swagger.
(This will move to the new Admin area of Swagger in v3.x)
There are a few duplicate triples like this in the International Snapshot. When building the semantic index we sort by effectiveTime and active to get the most effective relationships in the right order for processing but it looks like avoiding duplicate triples with different relationship ids is not working when importing a delta. I would be interested to hear if rebuilding the semantic index solves this.
I just ran the rebuild. Now I do get back that parent when I query the MAIN/SNOMED-VET endpoint, but it is very slow to return (like 6-7 seconds). Before it was pretty much instantaneous. I also see this log message output when I call that endpoint now:
2019-04-30 12:04:32.967 WARN 2794 --- [/O dispatcher 1] org.elasticsearch.client.RestClient : request [GET http://localhost:9200/es-query/query-concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&search_type=dfs_query_then_fetch&batched_reduce_size=512] returned 1 warnings:
[299 Elasticsearch-6.4.2-04711c2 "Deprecated: the number of terms [699096] used in the Terms Query
request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the
[index.max_terms_count] index level setting." "Tue, 30 Apr 2019 17:04:25 GMT"]
that seems surprising for just a single concept parent query.
Thanks for trying that. I'm glad you are getting the desired parents back now. The semantic index rebuild on this branch had quite a performance impact didn't it!
It's slower now because we now have two full semantic indexes sitting on top of each other. One on MAIN
and the other on the MAIN/SNOMED-VET
branch. The large query and slower query time is because the query clause is excluding all the concepts in the MAIN semantic index after they were all replaced when it was rebuilt on SNOMED-VET. Just about the only weakness of Snowstorm is that if you replace tens of thousands of components on branches other than MAIN things will start to slow down.
I've marked this down as a bug. It's going to take some thought to solve this without impacting the performance of the incremental semantic index update. Thanks for reporting the issue.
@dkincaid If you would like this working now another workaround you could try is to import the vet extension into MAIN then rebuild the semantic index on MAIN and just not use the SNOMED-VET branch. That should give you fast consistent results until this bug can be fixed.
Hi @dkincaid,
In version 4.1.0 of Snowstorm we have updated the semantic index update function to use all active triples (source, type and destination concept) when processing each relationship change. This was necessary because in the US Edition there are over one hundred cases of triples being made inactive in the US module straight after the same triple is made active in the International module. The inactivation in the US module is done using a different relationship id but Snowstorm was making the triple inactive until this fix.
This should also fix the issue you were seeing where relationships were going missing because I believe this was happening for the very same reason. This fix should give you accurate child/parent/ECL results straight after the RF2 import. The workaround we tried before gives me confidence that v4.1.0 (or later) will work for you without wrecking your performance.
I just thought I should let you know in case you have time to try it again. I can recommend deleting all your Snowstorm Elasticsearch indexes and starting a fresh because some of the index mappings have changed to support better non-english search and other features. We still require just the date in the effectiveTime field so remember to simplify those if you do import the Vet Extension.
I hope you are tempted to try! 😄
Kind regards, Kai
Closing this ticket because I believe it's fixed in 4.1.0. Please add comments or reopen the ticket as required.
High level summary - after importing the SNOMED International version followed by the SNOMED Veterinary Extension some relationships are missing from the child branch created for the extension when the extension contains inactive relationships with earlier effective times than the International edition.
Here are the steps I followed
java -Xms2g -Xmx2g -jar target/snowstorm*.jar
MAIN
branchMAIN/SNOMED-VET
branchAfter finishing this process the issue is seen by calling the
findConceptParents
endpoint and specifying the following parameters:the response code is a 200 and the response body is an empty array (
[]
).If I make the same endpoint call but change the branch to
MAIN
I get one parent returned, conceptId 321351000009104.In the SNOMED International edition Relationship file this relationship is present and active with effectiveTime = 20160131:
in the Veterinary extension Relationship file the relationship also exists but is inactive with effectiveTime = 20160130:
I am also attaching the log output from the import of the extension file.
vetext-snowstorm-import-log.txt
Please let me know if there is any other information I can provide or troubleshooting I can help with.