IHTSDO / snomed-subontology-extraction

Other
6 stars 2 forks source link

error importing subontology into snowstorm #2

Open liquid36 opened 7 months ago

liquid36 commented 7 months ago

Hi! I want to use this utility create a light RF2 ZIP file in order to test my application workflow.

java --add-opens java.base/java.lang=ALL-UNNAMED -Xms4g -jar snomed-subontology-extraction-*-executable.jar  -source-ontology ontology-2024-03-26_11-57-07.owl  -input-subset concepts.list   -output-rf2  -rf2-snapshot-archive SnomedCT_Argentina-EditionRelease_PRODUCTION_20230531T120000Z.zip

concepts.list

89901005
387713003
71388002
138875005

After running this command, i imported the zip file into snowstorm but i got the following error:

2024-04-08T13:24:19.229Z  INFO 1 --- [nio-8080-exec-1] o.s.s.rest.config.RestControllerAdvice   : bad request Duplicate concept document found with id 900000000000441003, A:MAIN:1712582554941:Mon Apr 08 13:22:34 UTC 2024 B:MAIN:1712582554941:Mon Apr 08 13:22:34 UTC 2024.

there are not any concepts duplicated in zip files.

kaicode commented 7 months ago

This is a very unusual error. I recommend deleting the Elasticsearch indices and trying the import again. The easy way to delete all Elasticsearch indices is using a delete REST request:

curl -XDELETE http://localhost:9200/*

Then restarting Snowstorm will automatically recreate the indices that are needed, ready for the import.

liquid36 commented 7 months ago

I did it several times. Deleting everything and importing again.

What took my attention is that the importer only recognize 30 concepts but in the concepts files there is more:

sct2_Concept_Snapshot_INT_20240326.txt

id  effectiveTime   active  moduleId    definitionStatusId
106237007   20110131    1   900000000000012004  900000000000074008
116680003   20110131    1   900000000000012004  900000000000074008
123037004   20020131    1   900000000000207008  900000000000074008
129284003   20020131    1   900000000000207008  900000000000074008
138875005   20020131    1   900000000000207008  900000000000074008
246061005   20110131    1   900000000000012004  900000000000074008
260686004   20110131    1   900000000000012004  900000000000074008
260787004   20020131    1   900000000000207008  900000000000074008
362981000   20020131    1   900000000000207008  900000000000074008
363704007   20110131    1   900000000000012004  900000000000074008
387713003   20220930    1   900000000000207008  900000000000074008
405815000   20110131    1   900000000000012004  900000000000074008
410662002   20110131    1   900000000000012004  900000000000074008
424226004   20110131    1   900000000000012004  900000000000074008
609096000   20130731    1   900000000000012004  900000000000074008
69536005    20020131    1   900000000000207008  900000000000074008
71388002    20020131    1   900000000000207008  900000000000074008
733073007   20170731    1   900000000000012004  900000000000074008
762676003   20180131    1   900000000000012004  900000000000074008
762705008   20180131    1   900000000000012004  900000000000074008
762706009   20180131    1   900000000000012004  900000000000074008
86174004    20020131    1   900000000000207008  900000000000074008
89901005    20020131    1   900000000000207008  900000000000073002
900000000000003001  20020131    1   900000000000012004  900000000000074008
900000000000006009  20020131    1   900000000000012004  900000000000074008
900000000000010007  20020131    1   900000000000012004  900000000000074008
900000000000011006  20020131    1   900000000000012004  900000000000074008
900000000000013009  20020131    1   900000000000012004  900000000000074008
900000000000017005  20020131    1   900000000000012004  900000000000074008
900000000000020002  20020131    1   900000000000012004  900000000000074008
900000000000073002  20020131    1   900000000000012004  900000000000074008
900000000000074008  20020131    1   900000000000012004  900000000000074008
900000000000225001  20020131    1   900000000000012004  900000000000074008
900000000000227009  20020131    1   900000000000012004  900000000000074008
900000000000441003  20020131    1   900000000000012004  900000000000074008
900000000000444006  20020131    1   900000000000012004  900000000000074008
900000000000446008  20020131    1   900000000000012004  900000000000074008
900000000000447004  20020131    1   900000000000012004  900000000000074008
900000000000448009  20020131    1   900000000000012004  900000000000074008
900000000000449001  20020131    1   900000000000012004  900000000000074008
900000000000450001  20020131    1   900000000000012004  900000000000074008
900000000000451002  20020131    1   900000000000012004  900000000000074008
900000000000452009  20020131    1   900000000000012004  900000000000074008
900000000000454005  20020131    1   900000000000012004  900000000000074008
900000000000455006  20020131    1   900000000000012004  900000000000074008
900000000000506000  20020131    1   900000000000012004  900000000000074008
900000000000507009  20020131    1   900000000000012004  900000000000074008
900000000000508004  20020131    1   900000000000012004  900000000000074008
900000000000509007  20020131    1   900000000000012004  900000000000074008
900000000000511003  20020131    1   900000000000012004  900000000000074008
900000000000548007  20020131    1   900000000000012004  900000000000074008
900000000000549004  20020131    1   900000000000012004  900000000000074008
900000000000550004  20020131    1   900000000000012004  900000000000074008

Do you see something wrog here?

kaicode commented 7 months ago

That concepts file looks fine. If you send me the zip I can debug the import.

liquid36 commented 7 months ago

This is the zip file

SnomedCt_test.zip

kaicode commented 7 months ago

There are three duplicate entries in the sct2_Concept_Snapshot_INT_20240326.txt file.

$ cut -f1 sct2_Concept_Snapshot_INT_20240326.txt | sort | uniq -d
410662002
762705008
900000000000441003

Perhaps these concepts appear in the concept snapshot files within the SnomedCT_Argentina-EditionRelease_PRODUCTION_20230531T120000Z.zip archive more than once?

kaicode commented 7 months ago

The concepts within the "SnomedCt_test.zip" zip file uploaded is very different from the concept file contents that were posted above. The contents above look okay but the one is the zip file contains duplicates.

liquid36 commented 7 months ago

So wired. The snomed-subontology-extraction outputs an RF2 folder and a zip file. I thought that they were the same but they don't. i posted you the contents of the RF2 folder.

What are the difference ? do you know?

liquid36 commented 7 months ago

Well, i run the importer again with the content of the RF2 folder and it worked perfect. thank very much.

kaicode commented 7 months ago

Great news about the import! That's strange about the RF2 folder.

Semohsbi commented 1 month ago

Hello,

I'm having trouble running a Java command, and I’m not very familiar with Java. Here’s what I tried:

& "C:\Program Files\Java\jdk-17\bin\java" -Xms4g -jar .\snomed-subontology-extraction-2.0.0-executable.jar -source-ontology .\ontology.xml -input-subset .\door.txt -verify-subontology

faced this error in attached file

screen1

and when I ran this command:

& "C:\Program Files\Java\jdk-17\bin\java" -Xms4g --add-opens java.base/java.lang=ALL-UNNAMED -jar .\snomed-subontology-extraction-2.0.0-executable.jar -source-ontology .\ontology.xml -input-subset .\door.txt -verify-subontology

image

I also tried using TTL and RDF/XML formats for the ontology, but no luck. Any advice on fixing this would be really helpful!

Thanks!

kaicode commented 1 month ago

The first command you tried failed because of a security issue. The second command you ran has the extra parameters to overcome the security issue but it seems to have not selected anything.

The ontology input format should use functional syntax. This can be generated using the SNOMED OWL Toolkit, see SNOMED to OWL Conversion. You can grab the jar file for that from the snomed-owl-toolkit releases page. For example "snomed-owl-toolkit-5.3.0-executable.jar".

This will produce an owl file with a filename like ontology-2024-10-30_19-45-07.owl, that should be used with the subontology -source-ontology param.

Semohsbi commented 4 weeks ago

I’m working on a project to create a subontology focused on “Door” using the IFC4 ontology from buildingSMART, available in JSON, TTL, NL, and XML formats (https://standards.buildingsmart.org/IFC/DEV/IFC4/ADD2_TC1/OWL/index.html).

The snomed-subontology-extraction tool requires OWL functional syntax. Is there the possibility of using these format in this tool? Would you recommend using TTL or RDF/XML formats from the IFC4 options, or is there another format that would work better?

Additionally, the SNOMED OWL Toolkit needs an RF2 structure, and I’m unsure if IFC4 includes this format. Would I need to convert the ontology into RF2 to use this tool?

Thank you for any guidance!

kaicode commented 4 weeks ago

The SNOMED Subontology Extraction tool has been created to work with the SNOMED CT Ontology and SNOMED CT RF2 release files only. It is not intended for us with other ontologies. The algorithm expects specific SNOMED CT concepts and axioms to be present and also uses the SNOMED CT attribute hierarchy.

Unfortunately I do not think this tool is suitable for use with the IFC4 ontology.