IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
207 stars 83 forks source link

Extension Import Advice #66

Closed StephanMeijer closed 4 years ago

StephanMeijer commented 5 years ago
./upload.sh "https://xyz.nl/imports/30351b71-27a7-41b7-b435-e19be74a7d17" "/home/abc/xyz/SnomedCT_InternationalRF2_PRODUCTION_20190731T120000Z.zip"

FULL import on Branch Main/PFER with one codesystem.

StephanMeijer commented 5 years ago
./upload.sh "https://xyz.nl/imports/c060c312-99f5-4892-9773-0a6d99a94589" "/home/abc/xyz/SnomedCT_Netherlands_ExtensionRelease_PRODUCTION_20190331T120000Z.zip"

DELTA import

kaicode commented 5 years ago

Hi @StephanMeijer,

Usually the first import is run onto branch MAIN. I would recommend trying that first. If you still have problems could you put some of the stack trace into a gist please?

Let us know how it goes.

Kind regards, Kai

StephanMeijer commented 5 years ago

We have the following releases:

-rw-r--r--  1 adminuser adminuser 508533660 Aug 12 14:04 SnomedCT_InternationalRF2_PRODUCTION_20190731T120000Z.zip
-rw-r--r--  1 adminuser adminuser 489745793 Aug 12 14:04 SnomedCT_Netherlands_EditionRelease_PRODUCTION_20190331T120000Z.zip
-rw-r--r--  1 adminuser adminuser  41730816 Aug 12 14:03 SnomedCT_Netherlands_ExtensionRelease_PRODUCTION_20190331T120000Z.zip
-rw-r--r--  1 adminuser adminuser    194355 Aug 12 14:03 SnomedCT_Netherlands_PatientFriendlyExtensionRelease_PRODUCTION_20190331T120000Z.zi

What would be a good way to import these into the Branching model specified in the documentation?

kaicode commented 5 years ago

Hi @StephanMeijer,

It looks like the SnomedCT_Netherlands_PatientFriendlyExtensionRelease is an extension of an extension. This type of import has not been thoroughly tested on Snowstorm but I see no reason why it wouldn't work.

Assuming that the PatientFriendlyExtension does contain the content of the main Netherlands extension I recommend trying the following steps:

This segmented approach would let you import the January 2020 International Edition into MAIN when it is released without impacting the content on the other branches. This would let you preview this content if needed. When the new version of the Netherlands extension becomes available you would need to rebase the Netherlands branch using the merge operation with source MAIN and target MAIN/SNOMECT-NL. Once that is complete (check logs) you can import the new Netherlands extension onto the MAIN/SNOMECT-NL branch.

When accessing content it's a good idea for clients to first use the /codesystems endpoint to find the Code System they are looking for and get the branch used in the latest version. This decoupling allows you to change the branch path later if needed without needing to change client configuration it also makes sure clients are using branches containing a complete code system version like MAIN/SNOMECT-NL/2019-03-31 rather than a working branch like MAIN/SNOMECT-NL which may be in the middle of an import.

I hope this helps, let us know how you get on!

Kind regards, Kai

mertenssander commented 5 years ago

Hi all,

I have this exact configuration running on snowstorm without any problems. Is there a reason you are trying to import the Netherlands Extension instead of the Netherlands Edition? This might circumvent the problem entirely, as this release contains the International Edition already merged with the Netherlands Extension. By the way; the Patient Friendly extension does not contain the Netherlands extension.

Best regards, Sander

kaicode commented 5 years ago

For anyone else reading this: When we are talking about SNOMED RF2 release packages we use the term "Extension" to mean that the contents of the package are in addition to the International or some other Edition, they can not be used standalone. We use the term "Edition" to mean that the package can be used on it's own. It contains everything including the International Core module etc. The International release is an Edition but any country extension can be packaged as an Edition by including all of the rows of the International content, which are still effective in that release, in the same RF2 files.

Hi @mertenssander, thanks for confirming about the Netherlands Patient Friendly extension.

Yes, there are pros and cons for using Extensions or Editions.

Importing the International Edition and Extension separately keeps your options open. It allows you to view the International Content on it's own and import other Extensions which depend on the International Edition at a later date if/when needed. It also lets you import a new International Edition release when it comes out in case you would like to preview any changes in the content.

If however you know that you never need to use the International Edition on its own or import another extension on top of the International Edition then it is less steps to import an Edition package straight onto the MAIN branch.

This is an implementation choice. I suppose if you could go for the simplest option you could always start again if circumstances change.

StephanMeijer commented 5 years ago

@kaicode Your help really made my week. I am currently trying it. Thank you for your extensive explanation!

StephanMeijer commented 5 years ago

How long should it take to complete the status of RUNNING?

Screenshot 2019-08-22 at 08 55 31
StephanMeijer commented 5 years ago
2019-08-21T17:55:46.309471844Z app[web.1]: 2019-08-21 17:55:46.309  INFO 1 --- [pool-2-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 10000 QueryConcepts
2019-08-21T17:55:49.355653026Z app[web.1]: 2019-08-21 17:55:49.355  INFO 1 --- [pool-2-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 10000 QueryConcepts
2019-08-21T17:55:52.279199837Z app[web.1]: 2019-08-21 17:55:52.278  INFO 1 --- [pool-2-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 10000 QueryConcepts
2019-08-21T17:55:55.084911257Z app[web.1]: 2019-08-21 17:55:55.084  INFO 1 --- [pool-2-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 4012 QueryConcepts
2019-08-21T17:55:57.299316469Z app[web.1]: 2019-08-21 17:55:57.299  INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil   : Timer TC index stated: Save updated QueryConcepts took 139.15 seconds
2019-08-21T17:55:57.299631526Z app[web.1]: 2019-08-21 17:55:57.299  INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil   : Timer TC index stated: total took 222.01 seconds
2019-08-21T17:56:22.331206022Z app[web.1]: 2019-08-21 17:56:22.330  INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil   : Timer TC index inferred: Collect changed is-a relationships. took 25.029 seconds
2019-08-21T17:56:26.139314029Z app[web.1]: 2019-08-21 17:56:26.139  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/relationship/relationship/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [419712] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T17:56:26.141730222Z app[web.1]: 2019-08-21 17:56:26.139  INFO 1 --- [pool-2-thread-1] o.s.s.c.d.s.SemanticIndexUpdateService   : Performing incremental update of inferred semantic index
2019-08-21T17:56:26.475755986Z app[web.1]: 2019-08-21 17:56:26.475  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/semantic/queryconcept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [419713] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T17:56:26.475861683Z app[web.1]: 2019-08-21 17:56:26.475  INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil   : Timer TC index inferred: Collect existingAncestors from QueryConcept. took 4.145 seconds
2019-08-21T17:56:26.781562343Z app[web.1]: 2019-08-21 17:56:26.773  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/semantic/queryconcept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [419712] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T17:56:27.123492559Z app[web.1]: 2019-08-21 17:56:27.123  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/semantic/queryconcept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [419713] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T17:56:27.123704270Z app[web.1]: 2019-08-21 17:56:27.123  INFO 1 --- [pool-2-thread-1] o.s.s.c.d.s.SemanticIndexUpdateService   : 0 existing nodes loaded.
2019-08-21T17:56:31.242620039Z app[web.1]: 2019-08-21 17:56:31.242  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/relationship/relationship/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [419712] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T18:01:57.881662912Z app[web.1]: 2019-08-21 18:01:57.881  INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil   : Timer TC index inferred: Update graph using relationships of concepts with changed modelling. took 330.758 seconds
2019-08-21T18:01:59.186046372Z app[web.1]: 2019-08-21 18:01:59.185  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/concept/concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&search_type=dfs_query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [350830] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T18:01:59.665059205Z app[web.1]: 2019-08-21 18:01:59.664  WARN 1 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://snowstorm.ohh.digibri.nl:9200/semantic/queryconcept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.8.2-b506955 "Deprecated: the number of terms [419713] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting."]
2019-08-21T18:01:59.665457979Z app[web.1]: 2019-08-21 18:01:59.665  INFO 1 --- [pool-2-thread-1] o.snomed.snowstorm.core.util.TimerUtil   : Timer TC index inferred: Collect existingDescendants from QueryConcept. took 1.784 seconds
2019-08-21T18:10:29.072678272Z app[web.1]: 2019-08-21 18:07:03.520  WARN 1 --- [heManagerDaemon] o.s.s.c.d.s.i.IdentifierCacheManager     : Identifier cache top ups took longer than polling interval: 10292ms
2019-08-21T18:12:12.736340860Z app[web.1]: 2019-08-21 18:12:08.517  WARN 1 --- [heManagerDaemon] o.s.s.c.d.s.i.IdentifierCacheManager     : Identifier cache top ups took longer than polling interval: 25032ms
2019-08-21T18:13:50.802621631Z app[web.1]: 2019-08-21 18:13:46.728  WARN 1 --- [heManagerDaemon] o.s.s.c.d.s.i.IdentifierCacheManager     : Identifier cache top ups took longer than polling interval: 39667ms
2019-08-21T18:15:27.725084748Z app[web.1]: 2019-08-21 18:15:23.670  WARN 1 --- [heManagerDaemon] o.s.s.c.d.s.i.IdentifierCacheManager     : Identifier cache top ups took longer than polling interval: 12909ms
2019-08-21T18:18:04.326518130Z app[web.1]: Exception in thread "classification-status-polling" java.lang.OutOfMemoryError: Java heap space
2019-08-21T18:18:04.326553157Z app[web.1]: Exception in thread "IdentifierCacheManagerDaemon" java.lang.OutOfMemoryError: Java heap space
StephanMeijer commented 5 years ago

First import succeeded after playing with -Xmx and adding Swapfile.

StephanMeijer commented 5 years ago

Imports seem to have succeeded. Is there a way to prioritize getting Preferred terms-descriptions from the Patient-Friendly refset instead of the general Dutch refset?

kaicode commented 5 years ago

Selecting Preferred Terms or Fully Specified Names for a chosen language is implemented but selecting for a specific language reference set within a language is not yet implemented. Feel free to raise an issue for this. I would like to gauge the interest from others for this feature.

StephanMeijer commented 5 years ago

Thanks for your support. For now we can arrange a workaround but we will be raising a new issue for this!

StephanMeijer commented 5 years ago

I suggest this to be included in documentation. This is great.

StephanMeijer commented 4 years ago

@kaicode Thanks for your help. How would je fill up the defaultLanguageReferenceSets?

Screenshot from 2020-06-05 14-56-26

kaicode commented 4 years ago

The defaultLanguageReferenceSets do not do anything within Snowstorm.

They are used by the snomed international browser to display a list of description tables in the concept details tab.

You could leave those blank and set them later if needed.

StephanMeijer commented 4 years ago

I'm now getting this error.. Interestingly; no code system has been found for SNOMECT-NL.

Screenshot from 2020-06-08 11-05-18

kaicode commented 4 years ago

The create code system function expects to be able to create a new branch for the content of that code system. The branch will be created with a base timepoint matching the requested International dependant version so that the branch contains the expected content from the International branch MAIN. If you have already created the branch for the extension you should probably delete the extension branch using the admin branch hard delete function, then create the code system and import the content again. Make sure you search for and delete any child branches before deleting the MAIN/SNOMEDCT-NL branch. A workaround would be to create the code system using a different branch and then update the code system document in elasticsearch manually to point to the correct branch however the first option is much easier and probably quicker.