IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
204 stars 80 forks source link

Failed to import UK Drug extension (SOLVED) #517

Closed ciprianaradulescu closed 1 year ago

ciprianaradulescu commented 1 year ago

Hello,

I've loaded the snomed international edition SnomedCT_InternationalRF2_PRODUCTION_20230131T120000Z as per the process documented here https://github.com/IHTSDO/snowstorm/blob/master/docs/loading-snomed.md. This resulted in the following 2 branches: MAIN and MAIN/2023-01-31, both looking ok.

After the international import I'm trying to import the UK Drug extension SnomedCT_UKDrugRF2_PRODUCTION_20230510T000001Z as per the process documented here https://github.com/IHTSDO/snowstorm/blob/master/docs/updating-snomed-and-extensions.md. I've tried various combinations of parameters, either by specifying / ommiting the dependant version or by setting the createCodeSystemVersion both true and false, but everytime i get this error:

snowstorm | 2023-05-30 13:08:15.320 ERROR 1 --- [ool-11-thread-9] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. snowstorm | snowstorm | java.lang.NullPointerException: null snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.lambda$processEntities$1(ImportComponentFactoryImpl.java:131) snowstorm | at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.processEntities(ImportComponentFactoryImpl.java:130) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl$4.persistCollection(ImportComponentFactoryImpl.java:113) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl$PersistBuffer.flush(ImportComponentFactoryImpl.java:332) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl$PersistBuffer.save(ImportComponentFactoryImpl.java:327) snowstorm | at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.newReferenceSetMemberState(ImportComponentFactoryImpl.java:293) snowstorm | at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.lambda$loadRefsets$4(ReleaseImporter.java:496) snowstorm | at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.readLines(ReleaseImporter.java:603) snowstorm | at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.lambda$readLinesCallable$5(ReleaseImporter.java:514) snowstorm | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) snowstorm | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) snowstorm | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) snowstorm | at java.base/java.lang.Thread.run(Thread.java:829).

Could anyone please provide some information as to what is causing the error?

FYI: I'm running the docker compose environment from here https://github.com/IHTSDO/snowstorm/blob/master/docs/using-docker.md, but with the ES memory increased to 24g because with the default setting it ran out of memory.

Thank you, Ciprian

kaicode commented 1 year ago

Hi Ciprian, I'm sorry to hear that this is not working as expected. Could you confirm what version of Snowstorm and Elasticsearch you are running please?

ciprianaradulescu commented 1 year ago

Sure, and sorry, i forgot to mention them 🤦 ES: 7.10.2, Snowstorm: latest (not sure if this means 8.1.0 or something else)

kaicode commented 1 year ago

That's perfect thanks. I will try to reproduce.

kaicode commented 1 year ago

Although I'm from the UK I am not familiar with the UK SNOMED packages, could you help me please? I've downloaded the file uk_sct2dr_36.1.0_20230510000001Z.zip from Trud. That contains snapshot files under SnomedCT_UKEditionRF2_PRODUCTION_20230510T000001Z and also snapshot files under SnomedCT_UKDrugRF2_PRODUCTION_20230510T000001Z.

To reproduce the issue should I upload the whole of the zip or create a new zip using one of those sub directories?

ciprianaradulescu commented 1 year ago

So far, I've only tried with importing SnomedCT_UKDrugRF2_PRODUCTION_20230510T000001Z directly over the International Edition. Honestly, I'm not sure if the UK Edition itself needs to be imported before the Drug Extension.

kaicode commented 1 year ago

No worries, I will just try the same.

kaicode commented 1 year ago

I think I have reproduced the issue. I received several warnings in the log like this:

2023-05-31 10:47:39.417  WARN 73749 --- [nPool-worker-11] o.s.s.c.d.s.ReferenceSetMemberService    : Refset member refers to description which does not exist, this will not be persisted 002356c4-6aef-525d-ae05-a62e9606aadd -> 46691701000001119

This hints at another package needing to be imported first. I think that is the cause of the null pointer, this is a bug and related to multithreading when processing refset members.

After the null pointer Snowstorm attempts to rollback the commit but for me this failed with a timeout error:

org.springframework.dao.DataAccessResourceFailureException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]; nested exception is java.lang.RuntimeException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]

If you also got this error you must run the admin function to complete the commit rollback on whatever branch the import ran on, it's POST /admin/{branch}/actions/rollback-partial-commit in swagger.

After that we should be able to import the UK edition SnomedCT_UKEditionRF2_PRODUCTION_20230510T000001Z and then the drugs extension on top. You could create a nested codesystem structure for this like:

When the time comes to upgrade this would allow you to upgrade the International edition, UK edition and drug extension as separate code systems. We recommend this approach when using these extension style packages.

This is my plan. I will test this.

kaicode commented 1 year ago

I am getting further warnings about refset members referring to descriptions that do not exist when importing SnomedCT_UKEditionRF2_PRODUCTION_20230510T000001Z. I've found the missing descriptions in the UK termbrowser.. they are part of the 999000011000000103 |SNOMED CT United Kingdom clinical extension module|.. descriptions in that module do not appear to be part of this package.

I think we need some advice from the NHS about what order their packages should be imported in. I have heard about a monolith package which may be a workaround for these dependency problems.

kaicode commented 1 year ago

Downloading the SNOMED CT UK Monolith Edition, RF2: Snapshot to try that.. it includes the International Edition and various UK extensions. That can be imported into a blank Snowstorm onto the MAIN branch, no separate code systems are necessary. Trying...

kaicode commented 1 year ago

I have just been warned by a college that the UK Monolith package could take many hours to import the snapshot. I will set that to run at the end of the day so I can keep developing on this machine during the day. Will post an update in the morning.

ciprianaradulescu commented 1 year ago

Thanks a lot for your help. Truth be told, I'm really confused about what needs to be imported, and in what order, in order to get the UK drug database up and running. I didn't know about the monolith package. That sounds like it might solve all of our issues. Looking forward to hearing about how the import went.

kaicode commented 1 year ago

I was able to import the SNOMED CT UK Edition Monolith package directly onto the MAIN branch by starting with a blank Elasticsearch and starting Snowstorm with the following options:

java -Xms8g -Xmx8g -jar snowstorm-8.1.0.jar --elasticsearch.index.max.terms.count=1000000 --import=../../release/uk_sct2mo_36.1.0_20230510000001Z.zip

The max.terms.count setting is required with the UK edition to prevent exceptions when creating the ECL index. The import took 90 minutes on a Macbook Pro M1.

ciprianaradulescu commented 1 year ago

Awsome ! I'll give that a try and close the issue if it imports successfully. Thank you very much for your help !

ciprianaradulescu commented 1 year ago

The import finished successfully after about 1 hour or so. However, the MAIN branch is now locked. Is this expected?

[ { "path": "MAIN", "containsContent": true, "locked": true, "creation": "2023-06-02T08:26:16.486Z", "base": "2023-06-02T08:26:16.486Z", "head": "2023-06-02T08:26:16.767Z", "creationTimestamp": 1685694376486, "baseTimestamp": 1685694376486, "headTimestamp": 1685694376767, "versionsReplacedCounts": { "ReferenceSetType": 0 }, "deleted": false } ]

Also, I'm not getting any search results when querying for basic drug concepts such as nurofen and paracetamol.

kaicode commented 1 year ago

The import will go very quiet after importing all the refset members for a long time. This is calculating the index for ECL queries. You should then see lots of QueryConcepts being saved. This happens twice because Snowstorm allows ECL on the stated form (axioms for authoring) and the inferred form (for EHRs and implementers). The import may not have completed yet. Check for a line in the log file like this:

2023-05-31 19:37:48.845  INFO 3292 --- [           main] o.s.s.core.rf2.rf2import.ImportService   : Completed RF2 SNAPSHOT import on branch MAIN in 5470 seconds. ID 301f6919-db2f-4f37-8fce-ec55b269bf8a
ciprianaradulescu commented 1 year ago

Oh, ok. The import completed log is missing, so I'll wait for it to either say it's done, or crash :D Thanks !

ciprianaradulescu commented 1 year ago

It worked ! And the search seems to return everything I need. Closing the ticket and thank you very much for your support.

kaicode commented 1 year ago

Great news! Thanks for working through this with me @ciprianaradulescu. I will be referring people to this ticket for UK import answers.

kaicode commented 1 year ago

See this thread @abelardy 😄 Let us know how it goes!

abelardy commented 1 year ago

See this thread @abelardy 😄 Let us know how it goes!

Thanks Kai... that --elasticsearch.index.max.terms.count=1000000 option was the final piece of the jigsaw!

The attached HTML document (hidden in a zip) is a detailed walkthrough that - cross fingers - works for standing up a Ubuntu VirtualBox VM, installing ElasticSearch and Snowstorm 8.1.0 on it, populating the Snowstorm server with the August 2023 release of the UK Monolith, and then testing it by firing ECL queries at it from other machines on the local network.

Snowstorm Install and Test 202308.zip

kaicode commented 1 year ago

Brilliant, thanks for sharing @abelardy !