Java Heap space error when i try to rf2 to json for FULL snomed

mohammadfarooqi commented 5 years ago

Hi

I am trying to use the rf2 to json 1.3 jar file to do the conversion. However when i try to use the FULL version of the does, I get the following error:

..............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.lang.String.split(String.java:2377) at java.lang.String.split(String.java:2422) at java.util.UUID.fromString(UUID.java:192) at org.ihtsdo.json.TransformerDiskBased.loadLanguageRefsetFile(TransformerDiskBased.java:869) at org.ihtsdo.json.TransformerDiskBased.processFiles(TransformerDiskBased.java:265) at org.ihtsdo.json.TransformerDiskBased.convert(TransformerDiskBased.java:185) at org.ihtsdo.json.runners.ConfigRunner.execute(ConfigRunner.java:131) at org.ihtsdo.json.runners.ConfigRunner.main(ConfigRunner.java:31)

This is what is in my snomed zip file:

The Snapshot folder works, however the Full folder gives the Java heap space error.

This is my config.xml file:

`<?xml version="1.0" encoding="UTF-8"?>

true en 900000000000003001 900000000000509007 true false International Edition en-edition 20180731 20190731 /c/code/simpatico/rf2-to-json-jar/output /c/Users/moham/Downloads/SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z/SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z/Snapshot ` Please advise.

kaicode commented 5 years ago

Hi @mohammadfarooqi how much memory are you giving the JVM to run this process? We give java 8GB of heap to run the conversion of a Snapshot. I would expect the Full to take many times this amount. You could try running the Snapshot first to get things working and give you a benchmark of what to expect in terms of memory and time. Full files are many times larger than Snapshots, although this is theoretically possible it is not something we typically run through the JSON conversion process.

Another option if you would like access to every version of SNOMED CT via a REST interface is to use Snowstorm.

mohammadfarooqi commented 5 years ago

Hi @kaicode , Thanks for getting back to me. I think the Snapshot version is enough for me. I was wondering if you could provide some direction. The reason I loaded these codes is to perform the following task.

https://www.hl7.org/fhir/valueset-participant-role.html

That link has a list of codes that are part of:

`

Include codes from http://snomed.info/sct where concept is-a 125676002 (Person)
Include codes from http://snomed.info/sct where concept is-a 223366009 (Healthcare professional)
Include codes from http://snomed.info/sct where concept is-a 394730007 (Healthcare related organisation) `

What I needed to do was seperated the list of codes on that page into the proper parent. There being 3 parents in this case (Person, Healthcare professional and Healthcare related organization).

I was hoping i could use the api in some way to get the proper parents and have the list divided into the three categories.

Wondering if there is a way I could use the api now to perform that task? Any guidance is appreciated. Is there a route that already allows that?

Thanks, Mohammad

kaicode commented 5 years ago

Yes, you can load all the descendants from the SNOMED CT hierarchy under the concept 125676002 (Person) using the ECL expression <125676002 with the following API call: https://browser.ihtsdotools.org/ecl/MAIN/2019-01-31/concepts?ecl=<125676002&page=0&size=100

This includes the current International release date and you can see the concept id in the ECL expression which is in the parameters. Use pagination to fetch the whole results set.

You can repeat this for <223366009 and <394730007.

I hope this helps. Cheers, Kai

mohammadfarooqi commented 5 years ago

Hi @kaicode

This was exactly what I was looking for, thank you very much for the assist!

Best, Mohammad

IHTSDO / sct-snapshot-rest-api

Java Heap space error when i try to rf2 to json for FULL snomed #41