Closed ivelina-yordanova closed 4 years ago
Hi @ivelina-yordanova,
Could you please try with the latest version using the new RF2 import endpoint (/:path/import
)?
You should be able to import just a Snapshot RF2
, it is not required to have a FULL RF2 import content on MAIN.
See the following work-in-progress Extension documentation for any branch/Extension modeling question you have, and feel free to ask more if you have further questions.
Also, what kind of performance issue are you trying to solve? Less content means, less Elasticsearch index size and the number of documents is less compared to a FULL Edition import, which in turn yes will result in a performance improvement, that's for sure. If you do not require the FULL history, you should always use the latest Snapshot.
Regards and stay safe, Mark
Hi @cmark,
That sounds amazing thank you. I'll update and experiment with it.
In terms of performance- yes, after some consideration we came to the conclusion we don't need the full history and it's not worth it keeping it. The hope is (and I think it will) this might give us some noticeable reduction in response times.
Thanks, Ivelina
Hi @ivelina-yordanova,
Yes, using just the Snapshot RF2 could indeed yield major performance improvements.
I'd like to mention just a few performance numbers from our deployments. This is what we consider as baseline performance numbers and configure the environment (ES node count, master, replicas, etc.) based on these and the requested non-functional requirements (SLAs, response times, etc.).
25-30Gb
(but there is one environment where the index size is around 75Gb
).pt()
and fsn()
expanded is around 5-15ms
. 15-300ms
average, depending on the query itself, certain really complex queries can take seconds to complete, though.150-500ms
to complete.I hope this helps in performance optimization at your end. Feel free to ask any further questions.
Regards and stay safe, Mark
Hi @cmark ,
I did few quick tests and it seems like now it's fine to import a snapshot in a MAIN subbranch without having it in MAIN, but only for the international, if I try to import UK's snapshot under that branch I still get an error that the ontology needs to be on MAIN i.e intl in MAIN/
On the other hand, trying to import intl in MAIN but create a version branch automatically with the param "createVersions" also fails, understandably.
So is there a way to create this structure -MAIN - nothing -------- intl version snapshot --------------- UK version snapshot --------------- any other branch.. or importing in MAIN but also creating the version branch?
Thanks and regards, Ivelina
Hi @ivelina-yordanova,
So is there a way to create this structure
It should be possible to import a Snapshot
to any branch, even if MAIN
is empty, if this is not the case then this might be a bug or missing feature.
Could you please also send me the request/response pair that shows the RF2 import attempts and failures?
Also, may I ask why would you like to keep the MAIN
branch empty?
or importing in MAIN but also creating the version branch?
During Snapshots the createVersions
argument is currently not supported and setting it to true
will not create any effective times, this is because of in an RF2 Snapshot a single effectiveTime might not have the necessary cross-references to create a valid version tag. The only viable option there is to register the greatest effectiveTime
from the RF2 archive and create a version with that value, if requested.
I'll review this and file a new improvement ticket for Snapshot import createVersions
support.
In the meantime, you can create versions manually using the Code System Version API at
POST /codesystems/:shortName/versions
Cheers, Mark
Hi @cmark,
Thanks for replying so quickly, that makes a bit more sense now.
In regards to my failed attempt at importing the snapshot, what I did exactly was:
1) after seeing snowowl does not get create version branch automatically on snapshot import, I created manually a branch ( using /snomed-ct/v3/branches
, not the versions endpoint one you quoted above, might make a difference) i.e MAIN/2018-07-31
2) imported the international version there
3) created another subbranch MAIN/2018-07-31/UK
4) tried to import the UK extension there but it failed with : "Importing a release of SNOMED CT from an archive to other than MAIN branch is prohibited when SNOMED CT ontology is not available on the terminology server. Please perform a full import to MAIN branch first."
Is this scenario supposed to work or should the branch be created as a version one and if so at which point - after or before the import?
Regards, Ivelina
Hi @ivelina-yordanova,
That error message is from the 6.x version and should be removed I think. I'll raise an issue for this in our JIRA and will try to get it resolved in the upcoming 7.5.1 release (this Friday, April 17th).
Creating just a branch and not a version should work as well. Versions are basically branches with extra information attached to them, similar to a git tag + branch with the same name.
Is this scenario supposed to work or should the branch be created as a version one and if so at which point - after or before the import?
It should work as you have described. And it will after removing the error message. In the meantime, I suggest the following approach:
I hope this helps. Cheers, Mark
Hi @cmark,
Ok that's weird then, because I did pull the latest from your 7.x branch before doing this, so I assume that should include the latest features you are talking about.
I think the misunderstanding is from the fact that I was trying to import the snapshot into the version branch, not MAIN, I was trying to leave MAIN empty and then importing an extension under the version branch, but seems like the extension is somehow referring to the MAIN still and throws the error.
Is this scenario plausible - having nothing imported directly in MAIN?
Regards,
Hi @ivelina-yordanova,
The code that prevents RF2 imports on child branch under an empty MAIN is still present in the latest 7.x version. See here: https://github.com/b2ihealthcare/snow-owl/blob/7.x/snomed/com.b2international.snowowl.snomed.datastore/src/com/b2international/snowowl/snomed/datastore/request/rf2/SnomedRf2ImportRequest.java#L137 Feel free to disable those checks in your fork and rebuild to see how that goes, but after removing them the scenario you proposed should work.
Those checks will be reviewed (and probably removed) for the upcoming 7.5.1 release.
Cheers, Mark
HI @cmark,
Thanks, I found the place before but still don't know much about the actual datastore so I was not sure exactly what implications this will have.
Depending on other priorities I might tests it or wait for the release and update our fork again.
Thanks this was very helpful.
Regards, Ivelina
Hi @cmark,
I just pulled the latest changes included in the 7.5.1 release.
I see that the validations in SnomedRf2ImportRequest
are still there, so I am wondering is this because that is a valid requirement due to the way the datastore is organised or is is safe to remove it but it just slipped up in this release?
Regards, Ivelina
Hi @ivelina-yordanova,
Yes, those warnings are still there, we couldn't have the time to properly look into them. Feel free to remove them in your fork and report back if you find any errors when doing so. I'll get back to you when we have more insights.
Cheers, Mark
Hi @cmark,
We finally had the chance to test this. I had commented out the check as suggested, then ran a SNAPSHOT import of International in a version branch, having all other data wiped out before that (do MAIN was empty).
The import job ran and finished with status COMPLETED
but there weer a lot of issues returned as well. Also the cluster was left in a weird state - there are 2 unassigned shards and looking at the cluster in Cerebro it appears red
.
Can you help with interpreting the error message below? Did we do something wrong or is it just that this way of importing the data is still unstable and is a non-completed feature?
Thanks
{
"id": "0c0740167566abab9465ac31b0ab4975b62f28c9",
"key": "snomed-import-MAIN/2018-07-31",
"description": "Importing SNOMED CT RF2 file 'SnomedCT_InternationalRF2_PRODUCTION_20190131T120000Z.zip'",
"user": "snowowl-dev",
"scheduleDate": "2020-06-08T16:07:37.195+0000",
"startDate": "2020-06-08T16:07:37.239+0000",
"finishDate": "2020-06-08T16:57:18.665+0000",
"state": "FINISHED",
"completionLevel": 0,
"deleted": false,
"result": "{\"status\":\"COMPLETED\",\"issues\":[\"Extended Map target field was empty for '80009454-5531-5f78-b7c9-d288f2346d83'\",\"Extended Map target field was empty for '8002c5db-9038-50fe-8ccc-204a84213674'\",\"Extended Map target field was empty for '8003e488-77eb-5600-b541-7caa2e77503e'\",\"Extended Map target field was empty for '800449ce-37c1-5b0f-9dd5-570a36fc47cd'\",\"Extended Map target field was empty for '800d6fb8-a02f-5d4c-8490-79e922da580e'\",\"Extended Map target field was empty for '800eb8c2-10fc-5c7a-b71a-aa760fb465f8'\",\"Extended Map target field was empty for '800f53ad-bd42-545a-88c2-7ebcd7f354b6'\",\"Extended Map target field was empty for '801046f2-9bd0-5607-88af-c1306f566731'\",\"Extended Map target field was empty for '80111236-0235-55b2-bdf5-9549bf4baea4'\",\"Extended Map target field was empty for '801118b2-5a64-5fc1-b47b-cb90df302492'\",\"Extended Map target field was empty for '80112d52-243d-57c4-98d2-d7b3512af59f'\",\"Extended Map target field was empty for '8011dda6-13a6-56e8-b17e-d35a47323d28'\",\"Extended Map target field was empty for '8012f38e-49a5-5d87-8c74-1420b7e4c30c'\",\"Extended Map target field was empty for '80160259-f2e8-51d1-a507-cf9d0f873a79'\",\"Extended Map target field was empty for '80161ecc-5c31-5e4e-bf03-b72bf8d32c09'\",\"Extended Map target field was empty for '8017a655-ac7b-5f34-8545-b82e2b55a1b7'\",\"Extended Map target field was empty for '8017a72e-701e-538a-9649-df254429b144'\",\"Extended Map target field was empty for '80197e59-024f-5679-9206-749eee3c5e87'\",\"Extended Map target field was empty for '801c8c91-d910-5f46-ab56-d581f87b8a44'\",\"Extended Map target field was empty for '801d8e27-2eef-5c78-b7f5-55d520492da4'\",\"Extended Map target field was empty for '80214700-023f-5c22-a448-87687dc04593'\",\"Extended Map target field was empty for '802607b2-a05e-5c5e-9033-8880b8584fde'\",\"Extended Map target field was empty for '802653ff-9ae8-51be-807b-73af9dad9d71'\",\"Extended Map target field was empty for '80266cc2-ee20-50f2-ae28-a812495cf9df'\",\"Extended Map target field was empty for '802cd8af-5702-57de-9b5c-f98ddb037e7b'\",\"Extended Map target field was empty for '802db525-2c8d-5907-a4c6-6662babeedf6'\",\"Extended Map target field was empty for '802f9230-f038-59e8-b804-c354b10540ae'\",\"Extended Map target field was empty for '803182de-301d-53a4-bab9-94ec816a38c6'\",\"Extended Map target field was empty for '8031ca9a-f015-573f-b250-4412c9057450'\",\"Extended Map target field was empty for '803442e9-c4db-57b9-9cd8-a80a798f0f7d'\",\"Extended Map target field was empty for '803ae809-ad68-5425-b744-44796ce26895'\",\"Extended Map target field was empty for '803faca9-1745-5161-9a98-8cb7b23d3198'\",\"Extended Map target field was empty for '8041d5a1-d347-50ae-8ede-bf192569c58c'\",\"Extended Map target field was empty for '8043fac6-87af-54ba-a946-f716de1b101b'\",\"Extended Map target field was empty for '8045461b-9c39-5b1f-ba9d-d8eae27269a6'\",\"Extended Map target field was empty for '804643d3-c3ed-586d-bc99-e778fee9294b'\",\"Extended Map target field was empty for '80480d7f-7915-588e-9397-bdb873a38a07'\",\"Extended Map target field was empty for '80481ac4-da43-5882-8a75-3f843919b44e'\",\"Extended Map target field was empty for '804942a9-b822-50f6-afa9-628c36ef7b08'\",\"Extended Map target field was empty for '804a2304-1ce4-5203-b56a-299471324c05'\",\"Extended Map target field was empty for '804ba542-190d-5ba8-a3af-c9476722aded'\",\"Extended Map target field was empty for '804c25b5-cbab-5f17-851c-404378f6ab52'\",\"Extended Map target field was empty for '804c4959-f213-51c7-b0e8-db5c368feb69'\",\"Extended Map target field was empty for '804ccbee-d9f2-5ed6-b2e1-ea1a476038b9'\",\"Extended Map target field was empty for '804e9c2c-6044-59ba-8d98-502614a09b8a'\",\"Extended Map target field was empty for '8050bea4-dfb1-579c-a821-53fd74b282aa'\",\"Extended Map target field was empty for '80520877-86d1-5820-aceb-e96fbe9575de'\",\"Extended Map target field was empty for '8052d01c-05b4-51d8-8720-b2d166b247fb'\",\"Extended Map target field was empty for '805368bc-59b0-5ff0-9f25-4b29c1e10681'\",\"Extended Map target field was empty for '805395cc-7c79-5d24-9d14-ffc372db9345'\",\"Extended Map target field was empty for '805aef69-58a0-5890-b456-924730649dc0'\",\"Extended Map target field was empty for '805d549e-d032-5ceb-9c63-cc3e5257086f'\",\"Extended Map target field was empty for '805eeb34-572e-5425-8536-21d778c56ada'\",\"Extended Map target field was empty for '80600c4f-afb4-5dea-a544-9de4f021f7f3'\",\"Extended Map target field was empty for '8062044e-a5e9-5ef6-b7cb-7325ce5886b3'\",\"Extended Map target field was empty for '80651b24-dd25-5bff-b03b-1e4a40c297d3'\",\"Extended Map target field was empty for '8065503e-db9a-523d-9f8a-1fd1989a7140'\",\"Extended Map target field was empty for '80698369-829d-5a6a-9731-74a742ce3898'\",\"Extended Map target field was empty for '80699e8b-f8aa-5aac-baa2-2bf782d5644c'\",\"Extended Map target field was empty for '806b948c-2c41-561b-ab64-1cd0556b0a69'\",\"Extended Map target field was empty for '80740729-3eff-5013-99aa-f5e66051942f'\",\"Extended Map target field was empty for '8074401f-3477-5bab-a48e-b84e7f1a783b'\",\"Extended Map target field was empty for '8075c84b-e315-5c09-9d60-a921170d28da'\",\"Extended Map target field was empty for '8077ed27-4f95-5a53-ab96-729223870a78'\",\"Extended Map target field was empty for '80797fbc-66cb-5d06-ab99-2fb6208f2069'\",\"Extended Map target field was empty for '807c2163-54ef-5a2d-b7e2-e5c2445bb4b9'\",\"Extended Map target field was empty for '807d17b9-b3ef-5468-96a5-a568978809fb'\",\"Extended Map target field was empty for '807e12d0-ea86-5e35-967d-50a9bab58d9e'\",\"Extended Map target field was empty for '807f0c11-6d74-588c-8dc1-a5b26251f678'\",\"Extended Map target field was empty for '8081611d-8be1-53f3-85ac-58b5e89e244b'\",\"Extended Map target field was empty for '8081e912-24e3-5647-8f05-73b7eda9f430'\",\"Extended Map target field was empty for '80854379-b6c2-5e93-b9a1-9428350133c8'\",\"Extended Map target field was empty for '80867007-7bbf-5400-9689-94e945e9437c'\",\"Extended Map target field was empty for '808bba4d-d8a3-5323-936d-a2f251453f21'\",\"Extended Map target field was empty for '8092e03a-9d8b-522b-b3c7-80c49d28ebd2'\",\"Extended Map target field was empty for '809544e2-f414-54aa-bc2f-d9c19cc530c5'\",\"Extended Map target field was empty for '8095bc07-8c68-5326-ab51-012e63b072ef'\",\"Extended Map target field was empty for '80988aa5-d7bb-5db2-92f6-0756eac7efc1'\",\"Extended Map target field was empty for '809972d7-a9e5-5389-b135-5c85c64a2be2'\",\"Extended Map target field was empty for '809b3350-16f6-59c0-872d-c8faaaa70d52'\",\"Extended Map target field was empty for '809cc936-825c-5272-a62d-0d874044dd36'\",\"Extended Map target field was empty for '809d76b2-fd8c-564c-90de-8bd6b9f959fd'\",\"Extended Map target field was empty for '809f3efd-5099-5697-849e-af7a11421e5a'\",\"Extended Map target field was empty for '80a45684-a3b7-5c99-9f47-e5387fd09f41'\",\"Extended Map target field was empty for '80a6011d-fce4-5c72-9186-8d5abb28eec4'\",\"Extended Map target field was empty for '80a70ee2-a7d6-5229-9f9c-7563bd166053'\",\"Extended Map target field was empty for '80aaa778-5e9d-5eac-9cf5-f3d399c3dac7'\",\"Extended Map target field was empty for '80b00840-dab1-5cb1-9aa9-74ae1025ab98'\",\"Extended Map target field was empty for '80b01239-2d22-53de-b735-6778ea0b9c3c'\",\"Extended Map target field was empty for '80b096f2-6a39-595e-8509-805ab803a422'\",\"Extended Map target field was empty for '80b1b9b2-a2e7-5661-9552-297e884c4e80'\",\"Extended Map target field was empty for '80b249ba-9d65-51a9-aedd-ed47f31ffea0'\",\"Extended Map target field was empty for '80b5c137-2b80-5628-b2a3-62c70060f32b'\",\"Extended Map target field was empty for '80b77910-01e1-5fe5-bced-7f90c3821109'\",\"Extended Map target field was empty for '80ba965a-ac19-5b62-bbbf-f6b21b1e30ee'\",\"Extended Map target field was empty for '80bab5db-bf68-51d7-9b3c-7390cb21d45d'\",\"Extended Map target field was empty for '80c03167-3933-5df1-8f5f-e1c0aea80a7f'\",\"Extended Map target field was empty for '80c146eb-82bc-5d71-95b6-35f4ced31e38'\",\"Extended Map target field was empty for '80c1ef8c-508d-5be5-9503-5c7f1f63fb78'\",\"Extended Map target field was empty for '80c32a08-8aba-5d04-8eee-4d8668ecb59c'\"]}",
"parameters": "{\"rf2ArchiveId\":\"cafaafc4-e4b7-44b4-a5c3-56fbc514efa2\",\"releaseType\":\"SNAPSHOT\",\"createVersions\":true,\"operation\":\"import\",\"type\":\"SnomedRf2ImportRequest\",\"branchPath\":\"MAIN/2018-07-31\",\"repositoryId\":\"snomedStore\"}"
}
Hi Mark, to add some more context, with Ivelina we are using a cluster that supports replicas. We tried to set SnowOwl initialisation to both 0 and 1 replicas, but got the same issue as above. CPU and heap seemed under control on all nodes though. But I wonder if we just need to increase the instance. We are using 4G, 2 CPUs per node.
Hi again,
It's not directly related to this but sort of is: we also noticed that the import fails with anything else but 0 replicas and the FULL import will not work at all under a slightly limited resources. Do you have a sort of list of prereq spec (memory, cpu, disk) for the ES cluster?
Thanks again
Hi @ivelina-yordanova,
We finally had the chance to test this. I had commented out the check as suggested, then ran a SNAPSHOT import of International in a version branch, having all other data wiped out before that (do MAIN was empty).
By MAIN was empty
you mean you did not import anything onto MAIN? Correct me if I'm wrong, but your plan earlier was to create the following branch/content structure (I also suggested this in my earlier comment):
----------INT Snapshot Import------2018-07-31 version------> MAIN branch
\
\-------------UK 2018-10-01 Snapshot------>
Also the cluster was left in a weird state - there are 2 unassigned shards and looking at the cluster in Cerebro it appears red.
Regarding red cluster state, could you please extract additional information from your ES cluster to further investigate the issue? This article might help uncover the root cause of the red cluster state: https://www.elastic.co/blog/red-elasticsearch-cluster-panic-no-longer
Can you help with interpreting the error message below? Did we do something wrong or is it just that this way of importing the data is still unstable and is a non-completed feature?
The list of issues are actually warnings and can be ignored, some of the official content does not have non-empty values in Extended Maps and Snow Owl reports that. The FINISHED
import state indicates that the import successfully completed.
Cheers, Mark
Hi @mattecasu and @ivelina-yordanova,
We are using 4G, 2 CPUs per node.
Do you have a sort of list of prereq spec (memory, cpu, disk) for the ES cluster?
To properly tell what's causing the issue during import and why it fails to import FULL content on a full ES cluster, we might need more information than the node hardware config (like number of nodes, replicas, master and data nodes, other processes/applications that using the cluster, cluster configuration, node configuration files).
What we usually recommend when setting up a new instance of Snow Owl with data is the following:
The recommended hardware configuration for Snow Owl with a co-located Elasticsearch cluster with 1 master/data node and 0 replica setup is 8 CPUs and 24-32GB of RAM. Both Snow Owl and the ES Node usually gets the same amount of memory (8-8GB) and the rest goes to IO cache and other services. Snow Owl's memory requirement often depends on the actual use case and can be increased accordingly. This is a setup that can support all SNOMED CT Extensions (FULL INT + N Snapshot extensions), importing them then serving/distributing/classifying/etc.
The recommended memory to import the FULL INT RF2 is 8GB, both Snow Owl and ES, so 16GB at the end. You can try to reduce the memory by a certain amount but to import a FULL RF2 successfully, Snow Owl requires at a minimum around 6GB of RAM (the connected ES instance should have at least 4GB).
we also noticed that the import fails with anything else but 0 replicas and the FULL import will not work at all under a slightly limited resources.
I suggest figuring out why do you get red cluster state, feel free to share logs and configs, I'm more than happy to help.
Cheers, Mark
Thank you Mark, we were able to import the snapshot following your original suggestion, and disabling the check (in our fork). We needed a bigger instance though.
Hi @mattecasu,
Glad to hear that. Are you able to query content from both MAIN and the child branches? May I ask what was the cause of the RED cluster state and also what was the instance size that worked for you in the end?
Cheers, Mark
Hi @cmark,
In the end the size that worked for us is the c5.4xlarge(16 CPU, 32G memory), we managed to also increase the replica count and all worked as expected during the import following your suggestion - imported International in MAIN and the created the version (not import in a version branch and leave MAIM empty as I understood initially).
The RED state was caused by many shards being left unassigned after the import finished (on our first try).
We are able to query both, the version branch and MAIN, given the setup mentioned above.
Everything looked normal so far, but another thing came up - the branch diff. The /compare and /reviews/:id/concept-changes endpoints seem to behave differently now in our 2 environments:
Do you know which of those if either or both would have an effect on those endpoints and what would be a resolution for the issue if we do want to keep both (the replica count and the snapshot import)?
Thanks, Ivelina
Hi @ivelina-yordanova,
Glad to hear the RED cluster state got resolved eventually.
Could you please raise another issue here on GitHub related to that compare issue? Thanks!
One thing to note: the /reviews
API is deprecated and we always recommend the use of /compare
.
It should behave the same way in all environments, all endpoints should behave the same way in all environments and regardless of the current Elasticsearch cluster setup.
Regards, Mark
Hi @cmark,
Yes, sure. raised it as a bug here. Tried to add as much details as possible but if anything is unclear, let me know.
Thanks, Ivelina
Hi @ivelina-yordanova,
Thank you for reporting another bug regarding the compare issue.
FYI: we have changed the RF2 import release type vs existing content validation in this commit, which will soon land on the 7.x stream and will be released in the upcoming 7.7.0 release. https://github.com/b2ihealthcare/snow-owl/commit/2fa6ea8243041b4013158c1df1e4a5f77b680673
Can we close this issue now? Any outstanding questions? Let me know.
Thank you, Mark
Hi @cmark,
Just updated to the latest release and noticed one of the changes is
[snomed] fix RF2 import content vs release type validation, allow importing on child branches if MAIN is empty (2fa6ea8)
Just wanted to confirm if this means that we can now import International snapshot into a sub-branch of MAIN and create a whole branching structure below that or is it still recommended to import into MAIN?
Are there any limitations to either approach?
Thanks, Ivelina
Hi @ivelina-yordanova,
Yes, we have removed all unnecessary validation rules from the import endpoint. You should be able to create your own branching structure and import content directly to child branches as opposed to directly on MAIN. This document might serve as a guideline what to store and where: https://docs.b2i.sg/snow-owl/index-2
Our recommendation still is to go with the MAIN === INT RF2 approach, that's what it is common for all SNOMED CT Extensions.
Let us know if there is any other outstanding issue with the UK RF2 import. If not, feel free to close this ticket.
Cheers, Mark
Closing as answered and resolved. Feel free to open another ticket if you have any further questions.
Hi,
I've been trying to import just a snapshot of SNOMEDCT but I've been having some issues.
1) this only works if the snapshot is imported directly in MAIN. It does not create a version branch for that snapshot even with the respective flag set.
2) then I tried to create a branch and import the snapshot into that version branch but I get an error saying that the ontology has to be in MAIN before importing snapshot in a branch.
So, my question is if there is something I am doing wrong or is the expectation that the FULL import is always in MAIN and only after that a snapshot can be imported to a branch?
Our goal was to try to have a lightweight version and see if that would have a positive impact on the performance.
Thanks,
Ivelina