IHTSDO / snomed-subontology-extraction

Other
5 stars 2 forks source link

error importing subontology into snowstorm #2

Open liquid36 opened 5 months ago

liquid36 commented 5 months ago

Hi! I want to use this utility create a light RF2 ZIP file in order to test my application workflow.

java --add-opens java.base/java.lang=ALL-UNNAMED -Xms4g -jar snomed-subontology-extraction-*-executable.jar  -source-ontology ontology-2024-03-26_11-57-07.owl  -input-subset concepts.list   -output-rf2  -rf2-snapshot-archive SnomedCT_Argentina-EditionRelease_PRODUCTION_20230531T120000Z.zip

concepts.list

89901005
387713003
71388002
138875005

After running this command, i imported the zip file into snowstorm but i got the following error:

2024-04-08T13:24:19.229Z  INFO 1 --- [nio-8080-exec-1] o.s.s.rest.config.RestControllerAdvice   : bad request Duplicate concept document found with id 900000000000441003, A:MAIN:1712582554941:Mon Apr 08 13:22:34 UTC 2024 B:MAIN:1712582554941:Mon Apr 08 13:22:34 UTC 2024.

there are not any concepts duplicated in zip files.

kaicode commented 5 months ago

This is a very unusual error. I recommend deleting the Elasticsearch indices and trying the import again. The easy way to delete all Elasticsearch indices is using a delete REST request:

curl -XDELETE http://localhost:9200/*

Then restarting Snowstorm will automatically recreate the indices that are needed, ready for the import.

liquid36 commented 5 months ago

I did it several times. Deleting everything and importing again.

What took my attention is that the importer only recognize 30 concepts but in the concepts files there is more:

sct2_Concept_Snapshot_INT_20240326.txt

id  effectiveTime   active  moduleId    definitionStatusId
106237007   20110131    1   900000000000012004  900000000000074008
116680003   20110131    1   900000000000012004  900000000000074008
123037004   20020131    1   900000000000207008  900000000000074008
129284003   20020131    1   900000000000207008  900000000000074008
138875005   20020131    1   900000000000207008  900000000000074008
246061005   20110131    1   900000000000012004  900000000000074008
260686004   20110131    1   900000000000012004  900000000000074008
260787004   20020131    1   900000000000207008  900000000000074008
362981000   20020131    1   900000000000207008  900000000000074008
363704007   20110131    1   900000000000012004  900000000000074008
387713003   20220930    1   900000000000207008  900000000000074008
405815000   20110131    1   900000000000012004  900000000000074008
410662002   20110131    1   900000000000012004  900000000000074008
424226004   20110131    1   900000000000012004  900000000000074008
609096000   20130731    1   900000000000012004  900000000000074008
69536005    20020131    1   900000000000207008  900000000000074008
71388002    20020131    1   900000000000207008  900000000000074008
733073007   20170731    1   900000000000012004  900000000000074008
762676003   20180131    1   900000000000012004  900000000000074008
762705008   20180131    1   900000000000012004  900000000000074008
762706009   20180131    1   900000000000012004  900000000000074008
86174004    20020131    1   900000000000207008  900000000000074008
89901005    20020131    1   900000000000207008  900000000000073002
900000000000003001  20020131    1   900000000000012004  900000000000074008
900000000000006009  20020131    1   900000000000012004  900000000000074008
900000000000010007  20020131    1   900000000000012004  900000000000074008
900000000000011006  20020131    1   900000000000012004  900000000000074008
900000000000013009  20020131    1   900000000000012004  900000000000074008
900000000000017005  20020131    1   900000000000012004  900000000000074008
900000000000020002  20020131    1   900000000000012004  900000000000074008
900000000000073002  20020131    1   900000000000012004  900000000000074008
900000000000074008  20020131    1   900000000000012004  900000000000074008
900000000000225001  20020131    1   900000000000012004  900000000000074008
900000000000227009  20020131    1   900000000000012004  900000000000074008
900000000000441003  20020131    1   900000000000012004  900000000000074008
900000000000444006  20020131    1   900000000000012004  900000000000074008
900000000000446008  20020131    1   900000000000012004  900000000000074008
900000000000447004  20020131    1   900000000000012004  900000000000074008
900000000000448009  20020131    1   900000000000012004  900000000000074008
900000000000449001  20020131    1   900000000000012004  900000000000074008
900000000000450001  20020131    1   900000000000012004  900000000000074008
900000000000451002  20020131    1   900000000000012004  900000000000074008
900000000000452009  20020131    1   900000000000012004  900000000000074008
900000000000454005  20020131    1   900000000000012004  900000000000074008
900000000000455006  20020131    1   900000000000012004  900000000000074008
900000000000506000  20020131    1   900000000000012004  900000000000074008
900000000000507009  20020131    1   900000000000012004  900000000000074008
900000000000508004  20020131    1   900000000000012004  900000000000074008
900000000000509007  20020131    1   900000000000012004  900000000000074008
900000000000511003  20020131    1   900000000000012004  900000000000074008
900000000000548007  20020131    1   900000000000012004  900000000000074008
900000000000549004  20020131    1   900000000000012004  900000000000074008
900000000000550004  20020131    1   900000000000012004  900000000000074008

Do you see something wrog here?

kaicode commented 5 months ago

That concepts file looks fine. If you send me the zip I can debug the import.

liquid36 commented 5 months ago

This is the zip file

SnomedCt_test.zip

kaicode commented 5 months ago

There are three duplicate entries in the sct2_Concept_Snapshot_INT_20240326.txt file.

$ cut -f1 sct2_Concept_Snapshot_INT_20240326.txt | sort | uniq -d
410662002
762705008
900000000000441003

Perhaps these concepts appear in the concept snapshot files within the SnomedCT_Argentina-EditionRelease_PRODUCTION_20230531T120000Z.zip archive more than once?

kaicode commented 5 months ago

The concepts within the "SnomedCt_test.zip" zip file uploaded is very different from the concept file contents that were posted above. The contents above look okay but the one is the zip file contains duplicates.

liquid36 commented 5 months ago

So wired. The snomed-subontology-extraction outputs an RF2 folder and a zip file. I thought that they were the same but they don't. i posted you the contents of the RF2 folder.

What are the difference ? do you know?

liquid36 commented 5 months ago

Well, i run the importer again with the content of the RF2 folder and it worked perfect. thank very much.

kaicode commented 5 months ago

Great news about the import! That's strange about the RF2 folder.