IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
204 stars 80 forks source link

Snomed CT UK Clinical Editions format change #269

Open steve1973 opened 3 years ago

steve1973 commented 3 years ago

Hi there

The format of the Snomed CT UK Clinical Edition releases has now changed. Here is a link to the document detailing the changes, https://hscic.kahootz.com/gf2.ti/f/762498/92240133.1/PDF/-/doc_UKSnomedCTTechnicalOverview_CurrentenGB_GB1000000_20210204.pdf. The release is now spread over multiple folders as detailed in the document. Will the current version of Snowstorm support this and if not will it be supported in a future release? A legacy version of the release will be available until the end of July 2021.

Many thanks

kaicode commented 3 years ago

Hi @steve1973, thanks for raising this.

I've had a read through and can't see anything there that should break the RF2 loading functionality. I have not tried loading a package with the new format but that would be the best way to test it!

steve1973 commented 3 years ago

Thanks Kai, so would we zip up the 3 UK folders into 1 zip file and import or import the 3 zip files separately? Like you say I can give it a test.

kaicode commented 3 years ago

Either way should work. If importing zip files separately be sure to import in the correct order regarding the module dependencies; the most upstream content should be imported first.

The import will not fail if content is imported in the wrong order, but the semantic index supporting search and ECL may not be correct if some content is missing because of importing downstream content first.

Many thanks for testing! Let us know how it goes.

steve1973 commented 3 years ago

Thanks Kai, I will let you know.

evergreen-lee-campbell commented 3 years ago

For the first import of a UK SNOMED release in the latest format, should SNAPSHOTs or FULL extracts of each of the three folders be imported to a MAIN/SNOMEDCT-UK branch? Presumably, thereafter, a DELTA of each of the three folders can be imported for each release, can it?

kaicode commented 3 years ago

Please use SNAPSHOT import type when importing extensions. At this time Snowstorm only supports importing FULL (all historic releases) for an Edition on the MAIN branch (usually the International Edition).

evergreen-lee-campbell commented 3 years ago

Thanks. Should the import, then, be in the following order?

Given the unzipped structure, thus: image

rorydavidson commented 3 years ago

At this stage, I'd recommend just loading the International Snapshot as well. I'm not sure having the FULL will give you much benefit at the moment.

abelardy commented 1 year ago

For clarity (and in case anybody else comes here later looking to wrangle the UK Extensions into submission), if your goal is to host both the UK Clinical and UK Drug extension data within one Snowstorm instance then I would strongly recommend not bothering with these as individual releases but instead load the corresponding "UK Monolith" snapshot from TRUD, which is pretty much actually an Edition style release: its a single set of 22 RF2 files in which the relevant International Edition and all the UK content are merged together.

However, if you only want EITHER the UK Clinical OR UK Drug extension release content but not both, then you must load the International Edition to MAIN as per Snowstorm documentation, and then create a MAIN/SNOMEDCT-UK CodeSystem before loading the zip of the relevant UK Extension release onto that....noting that if its the clinical release you are loading then you must first manually delete the extra copy of the International Edition subfolder from within the standard UK Clinical Extension distro zip.

To answer the other question above (and speaking to Kai's suggestion of 3-tiered CodeSystems) the UKEdition subfolder that ships within both UK Clinical and UK Drug extension released distros exists conceptually between the International Edition and either or both of the UK Clinical and UK Drug extensions. Therefore, if loading the various subfolders serially rather than as a bolus within a single zip, you should load exactly one copy of the UK Edition subfolder first before attempting any of the other subfolders.

Either way - Monolith or individual extensions - make sure you also include the -Xmx8g and --elasticsearch.index.max.terms.count=1000000 switch in the java invocation, and that this has at least 8Gb working memory to play with.

In case it helps anybody, the attached HTML (inside a ZIP) document describes in detail how to stand up a Snowstorm 8.1.0 instance inside a Ubuntu VirtualBox VM and then populate it with the August 2023 UK Monolith 36.4.0.

Snowstorm Install and Test 202308.zip