IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

Wrong number of results returned by GetConcepts #137

Closed d-ribas closed 4 years ago

d-ribas commented 4 years ago

Hi, I've recently installed snowstorm with the US release of SNOMED in our server and we are facing an issue where the number of results returned does not match the limit and total counters for getConcepts method. I've tried making the same exact api calls on ihtsdotools browser and this specific issue does not happen there, so perhaps there is something missing in our installation. We also added the core problem list reference set from National Library of Medicine, though we believe this should not be causing this problem.

The calls to get concepts, with an included trace file with the results, are as follows:

  1. A search term "shou fra" is provided and the service returns only 18 results with a limit of 21 and a total of 22 WithTermGetConcepts.trace.txt
  2. No search term is provided and only 1 result is returned with a limit of 20 and total of more than 80.000 NoTermGetConcepts.trace.txt

Can you help us solve this problem? Is there perhaps a configuration on our side that is missing?

Thank you

kaicode commented 4 years ago

Hi @d-ribas, I'm sorry to hear you are experiencing problems when browsing the US release in Snowstorm. Thanks for reaching out.

Could you outline the steps you took to set up Snowstorm in terms of the order of imports please? Also did you perform any branch merging? From there we may be able to speculate further.

Kind regards, Kai Kewley

d-ribas commented 4 years ago

Hi @kaicode, thank you for the quick answer.

The first import was the latest US release RF2 file from SNOMED CT, after that the Core Problems list reference set was added by transforming it to a RF2 format with the following steps:

  1. Created a new concept for the reference set on concept snapshot file
  2. Added description for it on description snapshot
  3. Set the new concept as descendant of the 446609009 | Simple type reference set on relationship file
  4. Added the entries in refset simple snapshot pointing to the new concept

We did not perform any branch merging so far.

Best Regards

kaicode commented 4 years ago

Your procedure sounds fine. I wonder if the some of the same components appear multiple times in the RF2 because of the way the US Edition is packaged. That may be causing the strange search result totals.

In our instance we imported the International Edition in MAIN then set up a second code system on MAIN/SNOMEDCT-US and imported the US Edition there. We know this works, snowstorm actively finds and removes any duplicate components using this method.

I'll take a look at the package and attempt to reproduce the behaviour you are seeing here.

kaicode commented 4 years ago

The RF2 looks fine. I've requested a temp server here to try a data load using the US Edition as the root code system. I should have an update early next week.

d-ribas commented 4 years ago

Thank you for looking into this.

rorydavidson commented 4 years ago

@d-ribas I have had an attempt to replicate the issue, loading up the US Edition snapshot from March 2020 as the root code system and got the results expected, so I couldn't replicate your issue.

Could you please expand and detail how you did the initial import? And could you please post the additional RF2 you created for the core problem list so we can try to have the same environment?

Could you also please say which versions of ElasticSearch and Snowstorm you are running?

Thanks.

d-ribas commented 4 years ago

Hi @rorydavidson

The initial import was with the March 2020 US Edition and then we imported the additional RF2 that can be seen here: (link removed because data protected by copyright). To import the RF2, in both cases, we used the following command: java -Xms2g -Xmx4g -jar target/snowstorm*.jar --delete-indices --import=[Absolute-path-of-SNOMED-CT-RF2-zip]

We are using Snowstorm v4.8.0 and ElasticSearch v6.8.0

Thank you

rorydavidson commented 4 years ago

Hi @d-ribas , Could you please confirm if you used the --delete-indices flag when you imported the second RF2? If so, that would be the issue. It should be imported on top of the existing indices using the import end point. Thanks, Rory

d-ribas commented 4 years ago

I confirm the -- delete-indices was used when importing the second RF2, but since the second RF2 is basically the whole US release with the additional reference set i thought there would be no issue, perhaps even the first import of the US release was not needed since the second RF2 already has all the data.

rorydavidson commented 4 years ago

@d-ribas Thanks, I'll download the RF2 and have a look now. To your point, there is no need to have done the first RF2 import as you essentially deleted anyway

rorydavidson commented 4 years ago

@d-ribas I was able to replicate your issue when using your customized package. The error is likely then to be in the package but I couldn't find the additions that you had made.

It may be a better idea to either add the concept via the API or as a separate RF2 containing only that content that you import on top of the US edition.

Please could you also hide the google shared file above as the content should not be shared publicly? Thanks very much.