IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

API Results don't match Snomed Browser #304

Closed DFCatita closed 3 years ago

DFCatita commented 3 years ago

Hello there, I'm new to Snowstorm and Snomed in general, but noticed some discrepancies between the snomed browser results and my local snowstorm service. Both using 31 Jul 2021 release files.

If I search for concept 5880005 in the browser I see 14 stated children, but if I do: http://localhost:8080/browser/MAIN/concepts/5880005/children?form=stated

I only get 8. For example, in the browser 363003006 is a child of 5880005, but I can't get that information anywhere using snowstorm endpoints.

I believe I'm doing something wrong, already loaded the release files twice but same result.

Any ideas?

kaicode commented 3 years ago

Hi there! Yes, it sounds like not all the content is loaded. Could you count how many active concepts are loaded? e.g. https://browser.ihtsdotools.org/snowstorm/snomed-ct/MAIN/2021-07-31/concepts?activeFilter=true&limit=1 or http://localhost:8080/MAIN/2021-07-31/concepts?activeFilter=true&limit=1 We are expecting "total": 350936, for the July international release.

DFCatita commented 3 years ago

Hi @kaicode thanks for the fast response.

{ "items" : [ { "conceptId" : "9999005", "active" : true, "definitionStatus" : "PRIMITIVE", "moduleId" : "900000000000207008", "effectiveTime" : "20020131", "fsn" : { "term" : "Duodenal ampulla structure (body structure)", "lang" : "en" }, "pt" : { "term" : "Duodenal ampulla structure", "lang" : "en" }, "id" : "9999005", "idAndFsnTerm" : "9999005 | Duodenal ampulla structure (body structure) |" } ], "total" : 350936, "limit" : 1, "offset" : 0, "searchAfter" : "WyI5OTk5MDA1Il0=", "searchAfterArray" : [ "9999005" ] }

This is the result, it seems like the total is fine...?

kaicode commented 3 years ago

Yes, total active concepts looks fine.

Can I ask why you decided to import multiple times? Did the initial imports end up with a status of COMPLETE?

You could also check the number of relationships imported:

You could also search for the inferred relationship between 363003006 and 5880005

If the relationships are all there but there are still not enough children listed I wonder if the import commit completed successfully.. you could check the status of the MAIN branch, http://localhost:8080/branches/MAIN we are expecting "locked": false,.

How is your Elasticsearch looking? Check that the status is green using http://localhost:9200/_cat/health?v

DFCatita commented 3 years ago

I imported a second time just to start everything over from scratch with the new RF, didn't even check the initial reports of that first time.

Number of relationships imported: "total" : 4195546

As for the inferred relationship between 363003006 and 5880005, it is there. Same response in both endpoints you shared.

MAIN status:

{ "path" : "MAIN", "state" : "UP_TO_DATE", "containsContent" : true, "locked" : false, "creation" : "2021-07-16T09:54:34.736Z", "base" : "2021-07-16T09:54:34.736Z", "head" : "2021-08-09T09:58:45.160Z", "creationTimestamp" : 1626429274736, "baseTimestamp" : 1626429274736, "headTimestamp" : 1628503125160, "userRoles" : [ ], "metadata" : { "internal" : { "classified" : "false" } }, "versionsReplacedCounts" : { "Concept" : 0, "Description" : 0, "QueryConcept" : 0, "ReferenceSetMember" : 0, "Relationship" : 0 }, "globalUserRoles" : [ ] }

ElasticSearch status is green, active shards percent 100%.

I don't really have a MAIN/2021-07-31 branch, I'm afraid I imported everything into MAIN, hehe.

Any more ideas?

DFCatita commented 3 years ago

I'm also using Snowstorm's master branch, I hope that's correct.

kaicode commented 3 years ago

I'm not quite sure what has happened with your data. I would recommend stopping snowstorm, removing all the Elasticsearch data and starting again. If using a local Elasticsearch you can stop it and delete the data directory. That is recreated when it starts.

Then you import a SNAPSHOT into MAIN with the createCodeSystemVersion set to true a version branch will be created for you. It's important to wait for the import to complete successfully before doing anything. In the log you should see something like:

Completed RF2 SNAPSHOT import on branch MAIN in xx seconds.

The import will go quite quiet for about 10 minutes near the end, it may look like the import is complete but the last step is to compute the semantic index which supports ECL and hierarchy browsing. Wait for this to complete, indicated by the "Completed" log message above or a COMPLETED import job status if using REST.

DFCatita commented 3 years ago

Ok @kaicode ! Will do just that and report back. Thanks.

DFCatita commented 3 years ago

@kaicode happy to report your solution worked. Maybe I didn't wait enough the other time. I also used the rest endpoint for loading this time, instead of curl.

Thanks a lot!

kaicode commented 3 years ago

Great news! Thanks for reporting back and glad it's working for you.