IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
204 stars 80 forks source link

How to best setup snowstorm for terminology in multiple countries #306

Open IQHT-DGH opened 3 years ago

IQHT-DGH commented 3 years ago

I've setup snomstorm locally on my system and loaded the UK and International version snapshot into the main branch. I also need to load snomed ct-au/amt and wondered what the recommended approach would be for example should I load this as a seperate branch?

If the recommendation is a separate branch for Australian terminology, could I be pointed to an example of how a branch is created as I don't seem to be able to find an example of how this might be done.

Thanks in advance for any help!

kaicode commented 3 years ago

We recommend creating a separate code system for each country extension. See the public snowstorm instance for examples: https://browser.ihtsdotools.org/snowstorm/snomed-ct/codesystems

A word of warning - we did a brief experiment and noticed that Snowstorm does not perform as well as expected with the whole UK extension loaded in this way. I did not have a chance to verify the configuration of the setup. Loading the UK extension straight into MAIN may perform better but this would prevent other extensions being loaded on child branches because the UK content would be inherited.

Good luck with your setup. Please do let us know how you get on and any lessons learnt.

IQHT-DGH commented 3 years ago

We recommend creating a separate code system for each country extension. See the public snowstorm instance for examples: https://browser.ihtsdotools.org/snowstorm/snomed-ct/codesystems

A word of warning - we did a brief experiment and noticed that Snowstorm does not perform as well as expected with the whole UK extension loaded in this way. I did not have a chance to verify the configuration of the setup. Loading the UK extension straight into MAIN may perform better but this would prevent other extensions being loaded on child branches because the UK content would be inherited.

Good luck with your setup. Please do let us know how you get on and any lessons learnt.

I had started to head down the route of loading international into MAIN, then create two branches, CT-UK and CT-AU would this in your option perform any better?

kaicode commented 3 years ago

Quite honestly I had never considered that but I think leaving MAIN empty would be a good idea yes. I would create a SNOMEDCT-UK codesystem - that will create branch MAIN/SNOMEDCT-UK. If you import the snapshot with the create version branch flag set to true you will also get a codesystem version entry and version branch, that will also enable the FHIR interface to work naturally in case you wanted to use that.

I would recommend the same for a SNOMEDCT-AU codesystem.

Both of these editions have unusual packaging (compared to the other 13 extensions and editions loaded into the browser). From memory the UK edition archive contains multiple concept, description and relationship files - I think the International and UK files are there separately. Snowstorm doesn't expect this format, you may need to combine them or load them separately one after the other. Again from memory I think the AU edition has no FSN descriptions and no axioms.. Snowstorm doesn't expect this either but I am not personally aware of the workaround needed to get it to load.

IQHT-DGH commented 3 years ago

Quite honestly I had never considered that but I think leaving MAIN empty would be a good idea yes. I would create a SNOMEDCT-UK codesystem - that will create branch MAIN/SNOMEDCT-UK. If you import the snapshot with the create version branch flag set to true you will also get a codesystem version entry and version branch, that will also enable the FHIR interface to work naturally in case you wanted to use that.

I would recommend the same for a SNOMEDCT-AU codesystem.

Both of these editions have unusual packaging (compared to the other 13 extensions and editions loaded into the browser). From memory the UK edition archive contains multiple concept, description and relationship files - I think the International and UK files are there separately. Snowstorm doesn't expect this format, you may need to combine them or load them separately one after the other. Again from memory I think the AU edition has no FSN descriptions and no axioms.. Snowstorm doesn't expect this either but I am not personally aware of the workaround needed to get it to load.

Quick update, leaving MAIN empty I was able to import the UK edition into MAIN/SNOMEDCT-UK without any issues - unfortunately the same cannot be said for AU this failed after running for a while. I deleted the AU codesystem, then recreated it - I then got an earlier version of AU edition, however now when I attempt an import it immediately fails, rather then running for a while as previously. Am I missing a vital step here, is there a way to view some kind of logging to have some idea why it failed?

I'm at the research stage with this at present, but the end game is to have a production terminology server that could be used with UK and AU, if AU cannot be imported into snomstorm its going to be an issue for me which I'm not sure how to get around - my usual role is coding to interrogate snomed rather than trying to set it the infrastructure so any pointer would be more than appreciated.

kaicode commented 3 years ago

@rorydavidson any pointers on importing AU? Does it just require adding an empty axiom refset to the archive?

rorydavidson commented 3 years ago

I think that would work. This is what exists in the UK edition (an empty axiom refset) so that should help.

IQHT-DGH commented 3 years ago

Thanks for the info, I have a UK snapshot, I don't see an refset file labelled axiom - would it have a different file name, and would it be as simple as copying UK to AU snapshot folder?

rorydavidson commented 3 years ago

@IQHT-DGH ah, interesting. What error do you get when you try to import the AU snapshot?

IQHT-DGH commented 3 years ago

@rorydavidson When I try and import AU snapshot I'm seeing....

{ "status": "FAILED", "branchPath": "MAIN/SNOMEDCT-AU", "createCodeSystemVersion": true, "moduleIds": [], "type": "SNAPSHOT" }

{ "cache-control": "no-cache, no-store, max-age=0, must-revalidate", "connection": "keep-alive", "content-type": "application/json", "date": "Fri, 13 Aug 2021 11:35:03 GMT", "expires": "0", "keep-alive": "timeout=60", "pragma": "no-cache", "transfer-encoding": "chunked", "x-content-type-options": "nosniff", "x-frame-options": "DENY", "x-xss-protection": "1; mode=block" }

Can you point me to any more logs inside the docker container that might help pinpoint the issue?

rorydavidson commented 3 years ago

You should see the snowstorm log in the output of the snowstorm container, which would show you the exception when the import fails.

IQHT-DGH commented 3 years ago

@rorydavidson I started from scratch with new containers and got to the import failure, here are the logs....

(Most lines moved to a gist by @kaicode) https://gist.github.com/kaicode/2320b3ae2b048fd21637059b96a12e11

Interesting parts here:

java.lang.IllegalStateException: Failed to update semantic index. Failed to convert axiom EquivalentClasses(:840444002 ObjectSomeValuesFrom(:609096000 ObjectSomeValuesFrom(:42752001 :231896005)))
...
Caused by: org.snomed.otf.owltoolkit.conversion.ConversionException: Expecting ObjectIntersectionOf at first level of expression, got ObjectSomeValuesFrom in expression ObjectSomeValuesFrom(<http://snomed.info/id/609096000> ObjectSomeValuesFrom(<http://snomed.info/id/42752001> <http://snomed.info/id/231896005>)).
...
2021-08-13 13:01:08.215  INFO 1 --- [pool-2-thread-1] io.kaicode.elasticvc.api.BranchService   : Rolling back commit on MAIN/SNOMEDCT-AU started at 1628858459314
kaicode commented 3 years ago

Hi @IQHT-DGH, the axiom for concept 840444002 |Dacryoadenitis due to Acanthamoeba keratitis (disorder)| is not valid. There is no parent concept defined so Snowstorm is not able to parse it. I would recommend removing this (or all lines but the header first line) from the OWL axiom reference set and trying the import again. The row in question has an owlExpression starting "EquivalentClasses(:840444002 ".

My instructions are quite brief - just let me know if you need more guidance to do this.

IQHT-DGH commented 3 years ago

Hi @kaicode, Thanks for the help, in the file that exists in two places... d81cdd3c-2079-4c6b-8308-43cf2169eb34 20210331 0 32506021000036107 733073007 840444002 EquivalentClasses(:840444002 ObjectSomeValuesFrom(:609096000 ObjectSomeValuesFrom(:42752001 :231896005)))

e72c39cf-4c38-4ce2-a5f8-e39e0abc9582 20210131 1 900000000000207008 733073007 840444002 EquivalentClasses(:840444002 ObjectIntersectionOf(:64572001 ObjectSomeValuesFrom(:609096000 ObjectIntersectionOf(ObjectSomeValuesFrom(:116676008 :409774005) ObjectSomeValuesFrom(:363698007 :13561001) ObjectSomeValuesFrom(:370135005 :441862004))) ObjectSomeValuesFrom(:609096000 ObjectSomeValuesFrom(:42752001 :231896005))))

Do I just need to remove these two whole lines?

kaicode commented 3 years ago

Yes, worth a try. If this doesn't work I would be tempted to remove all the rows from that file and just leave the header / first line. I hope this works! This is a very unusual issue - probably unique to AU.

IQHT-DGH commented 3 years ago

@kaicode Needed to delete all lines except the header, got further but hit another issue - in the swagger UI I don't see anywhere to increase the index. Logs are...

2021-08-13 14:50:21.398 INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil : Timer TC index stated: total took 3.856 seconds

2021-08-13 14:51:04.481 INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil : Timer TC index inferred: Collect changed is-a relationships. took 43.079 seconds

2021-08-13 14:51:04.862 ERROR 1 --- [pool-2-thread-1] o.s.s.core.rf2.rf2import.ImportService : Failed RF2 SNAPSHOT import on branch MAIN/SNOMEDCT-AU. ID 9174f06d-71cd-4ee7-92eb-e49c7374908c

org.springframework.data.elasticsearch.UncategorizedElasticsearchException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]; nested exception is ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]] at org.springframework.data.elasticsearch.core.ElasticsearchExceptionTranslator.translateExceptionIfPossible(ElasticsearchExceptionTranslator.java:67) at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.translateException(ElasticsearchRestTemplate.java:398) at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.execute(ElasticsearchRestTemplate.java:381) at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.searchScrollStart(ElasticsearchRestTemplate.java:304) at org.springframework.data.elasticsearch.core.AbstractElasticsearchTemplate.searchForStream(AbstractElasticsearchTemplate.java:266) at org.springframework.data.elasticsearch.core.AbstractElasticsearchTemplate.searchForStream(AbstractElasticsearchTemplate.java:253) at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.buildRelevantPartsOfExistingGraph(SemanticIndexUpdateService.java:535) at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.updateSemanticIndex(SemanticIndexUpdateService.java:202) at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.updateStatedAndInferredSemanticIndex(SemanticIndexUpdateService.java:127)

at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.preCommitCompletion(SemanticIndexUpdateService.java:91)
at io.kaicode.elasticvc.api.BranchService.completeCommit(BranchService.java:404)
at io.kaicode.elasticvc.domain.Commit.close(Commit.java:61)
at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.completeImportCommit(ImportComponentFactoryImpl.java:217)
at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.loadingComponentsCompleted(ImportComponentFactoryImpl.java:206)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:228)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:180)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.access$100(ReleaseImporter.java:164)
at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadSnapshotReleaseFiles(ReleaseImporter.java:44)
at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadSnapshotReleaseFiles(ReleaseImporter.java:68)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.snapshotImport(ImportService.java:198)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.importFiles(ImportService.java:157)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.importArchive(ImportService.java:108)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.lambda$importArchiveAsync$1(ImportService.java:243)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed] at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1888) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1865) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1622) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1579) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1549) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1065) at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.lambda$searchScrollStart$12(ElasticsearchRestTemplate.java:304) at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.execute(ElasticsearchRestTemplate.java:379) ... 25 common frames omitted Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://es:9200], URI [/relationship/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&scroll=60000ms&search_type=dfs_query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request] {"error":{"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: The number of terms [581691] used in the Terms Query request has exceeded the allowed maximum of [500000]. This maximum can be set by changing the [index.max_terms_count] index level setting.","index_uuid":"8Pws0HXBT1qPSKSugw8gqg","index":"relationship"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"relationship","node":"XZh6ZlLHTR2aFR1feLTjJQ","reason":{"type":"query_shard_exception","reason":"failed to create query: The number of terms [581691] used in the Terms Query request has exceeded the allowed maximum of [500000]. This maximum can be set by changing the [index.max_terms_count] index level setting.","index_uuid":"8Pws0HXBT1qPSKSugw8gqg","index":"relationship","caused_by":{"type":"illegal_argument_exception","reason":"The number of terms [581691] used in the Terms Query request has exceeded the allowed maximum of [500000]. This maximum can be set by changing the [index.max_terms_count] index level setting."}}}]},"status":400} at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1609) ... 30 common frames omitted

IQHT-DGH commented 3 years ago

@kaicode Are you able to help with the error I'm seeing around the allowed maximum and setting the max_terms_count, which is hopefully the last error?

kaicode commented 3 years ago

Snowstorm sets the index.max_terms_count on the Elasticsearch indices during startup. The default for Snowstorm is 500000 but it looks like AU needs more. Please set the Snowstorm configuration option elasticsearch.index.max.terms.count=600000 either in an application.properties file or using a startup argument, see configuration guide.

kaicode commented 3 years ago

Did that resolve the issue @IQHT-DGH ?

jayped007 commented 1 year ago

I had a similar issue, where elasticsearch was failing and indicating that max_terms_count needed to be increased. I am using a dockerized instance.

The following fixed it. Modified docker-compose.yml to add the following to the snowstorm service:

environment:
  - elasticsearch.index.max.terms.count=600000

With that in place, the RF2 load that previously failed began to work. Here is what I saw in the logs:

elasticsearch2 | {"type": "server", "timestamp": "2022-12-02T20:33:32,882Z", "level": "INFO", "component": "o.e.c.s.IndexScopedSettings", "cluster.name": "snowstorm-cluster", "node.name": "snowstorm", "message": " [member] updating [index.max_terms_count] from [65536] to [600000]", "cluster.uuid": "lEEC0PBBTxeAxz36jCvw7w", "node.id": "JzyeJDw9SKe5W_4TA89ImA" }

jayped007 commented 1 year ago

NOTE: the following failed when I tried to update the elasticsearch environment variables

index.max_terms_count=600000

That's close if not exact to what I added. The elasticsearch service would not start with that in place.

So it was important to update the snowstorm service rather than the elasticsearch service in this case.

Here is what the snowstorm definition looks like in my docker-compose.yml

  snowstorm:
    image: snomedinternational/snowstorm:latest
    container_name: snowstorm2
    restart: unless-stopped
    environment:
      - elasticsearch.index.max.terms.count=600000
    depends_on:
      elasticsearch:
        condition: service_healthy
    entrypoint: java -Xms2g -Xmx4g -jar snowstorm.jar --elasticsearch.urls=http://es:9200
    networks:
      elastic2:
        aliases:
         - snowstorm
    ports:
      - 8080:8080