b2ihealthcare / snow-owl

:owl: Snow Owl Terminology Server - a production-ready, scalable, FHIR Terminology Service compliant server that supports SNOMED CT International and Extensions, LOINC, RxNorm, UMLS, ICD-10, custom code systems and many others
https://docs.b2ihealthcare.com/snow-owl/
Apache License 2.0
258 stars 32 forks source link

Issues with triggering the classification #578

Closed cristinacivili-work closed 2 years ago

cristinacivili-work commented 4 years ago

Hi,

I've been trying to use the classification endpoint in Snow Owl, without any luck so far. Could you please clarify what are the expected values for the reasonerId field?

I know Snow Owl is equipped with ELK; I tried to input both "org.semanticweb.elk.elk.reasoner.factory" and "ELK 0.4.3", returned from the reasoners endpoint as extendionId and name, respectively, but it didn't work. In both cases the classification was scheduled and failed later without any error message.

Thanks in advance for your help!

cmark commented 4 years ago

Hi @cristinacivili-work,

An example request would look like this to trigger a classification on the MAIN branch:

POST http://localhost:8080/snowowl/snomed-ct/v3/classifications
{
  "branch": "MAIN",
  "reasonerId": "org.semanticweb.elk.elk.reasoner.factory"
}

The equivalent curl command:

curl -X POST "http://localhost:8080/snowowl/snomed-ct/v3/classifications" -H "Content-Type: application/json;charset=UTF-8" -d "{ \"branch\": \"MAIN\", \"reasonerId\": \"org.semanticweb.elk.elk.reasoner.factory\"}"

I hope this works for you. Let me know if you have any further questions.

Cheers, Mark

cristinacivili-work commented 4 years ago

Hi Mark, thanks for your prompt answer!

The one above looks like my first attempt. Any idea on what could cause a failure?

cmark commented 4 years ago

Hi @cristinacivili-work,

Do you have access to the logs? That might have something that can be used to figure out what's the root cause of the issue. The logs are located in Snow Owl's serviceability directory <SO_HOME>/serviceability/logs/log.log. In case of using the official docker container, use the docker logs <container-name> command to access the logs.

Regards, Mark

cmark commented 4 years ago

Hi @cristinacivili-work,

Just tried on our latest 7.7.0-SNAPSHOT server, it works fine with those settings. Here are the relevant log lines for reference:

[2020-06-03T13:21:30.946] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   lock                                                             Lock acquired for branch 'MAIN' of repository 'snomedStore' (classifying the ontology). 
[2020-06-03T13:21:30.967] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active concept IDs using revision searcher 
[2020-06-03T13:21:34.250] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active concept IDs using revision searcher [@3.300 s] 
[2020-06-03T13:21:34.251] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering concept IDs from stream 
[2020-06-03T13:21:34.252] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering concept IDs from stream [@3.302 s] 
[2020-06-03T13:21:34.262] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active concept flags (exhaustive) using revision searcher 
[2020-06-03T13:21:34.269] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active concept flags (fully defined, exhaustive) using revision searcher [@3.320 s] 
[2020-06-03T13:21:34.269] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active stated IS A graph edges using revision searcher 
[2020-06-03T13:21:34.312] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active stated IS A graph edges using revision searcher [@3.363 s] 
[2020-06-03T13:21:34.312] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active stated non-IS A relationships using revision searcher 
[2020-06-03T13:21:34.377] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active stated non-IS A relationships using revision searcher [@3.428 s] 
[2020-06-03T13:21:34.377] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active inferred relationships using revision searcher 
[2020-06-03T13:21:44.516] INFO  eventbus-1592                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":3}} 
[2020-06-03T13:21:54.036] INFO  eventbus-1651                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":3}} 
[2020-06-03T13:22:03.392] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active inferred relationships using revision searcher [@32.44 s] 
[2020-06-03T13:22:03.392] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active additional grouped relationships using revision searcher 
[2020-06-03T13:22:03.410] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active additional grouped relationships using revision searcher [@32.46 s] 
[2020-06-03T13:22:03.411] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering 'never grouped' type IDs using revision searcher 
[2020-06-03T13:22:04.035] INFO  eventbus-1656                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":18}} 
[2020-06-03T13:22:05.097] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering 'never grouped' type IDs using revision searcher [@34.15 s] 
[2020-06-03T13:22:05.097] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active stated OWL axioms using revision searcher 
[2020-06-03T13:22:13.294] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active stated OWL axioms using revision searcher [@42.34 s] 
[2020-06-03T13:22:13.294] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active concept flags (exhaustive, fully defined) using concept ID stream 
[2020-06-03T13:22:13.294] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active concept flags (exhaustive, fully definied) using concept ID stream [@42.34 s] 
[2020-06-03T13:22:13.295] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active stated IS A graph edges from relationship stream 
[2020-06-03T13:22:13.295] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active stated IS A graph edges from relationship stream [@42.35 s] 
[2020-06-03T13:22:13.295] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active stated non-IS A relationships using relationship stream 
[2020-06-03T13:22:13.295] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active stated non-IS A relationships using relationship stream [@42.35 s] 
[2020-06-03T13:22:13.296] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active inferred relationships using relationship stream 
[2020-06-03T13:22:13.296] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active inferred relationships using relationship stream [@42.35 s] 
[2020-06-03T13:22:13.296] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                >>> Registering active additional grouped relationships using relationship stream 
[2020-06-03T13:22:13.296] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy                                                <<< Registering active additional grouped relationships using relationship stream [@42.35 s] 
[2020-06-03T13:22:13.513] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   lock                                                             Lock released for branch 'MAIN' of repository 'snomedStore' (classifying the ontology). 
[2020-06-03T13:22:13.730] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy-inferrer                                       >>> Classification and inferred taxonomy extraction 
[2020-06-03T13:22:13.735] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.config.ConfigurationFactory                  Loading default configuration parameters for class org.semanticweb.elk.reasoner.config.ReasonerConfiguration 
[2020-06-03T13:22:13.810] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.Reasoner                            ELK reasoner was created 
[2020-06-03T13:22:13.812] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Loading of Axioms started 
[2020-06-03T13:22:13.814] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Loading of Axioms using 12 workers 
[2020-06-03T13:22:13.814] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy-inferrer                                       --- Reasoner task: Loading [--.-%] 
[2020-06-03T13:22:14.016] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy-inferrer                                       --- Reasoner task: Loading [00.5%] 
[2020-06-03T13:22:14.020] INFO  eventbus-1622                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":3}} 
[2020-06-03T13:22:20.504] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Loading of Axioms took 6690 ms 
[2020-06-03T13:22:20.504] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Property Saturation Initialization started 
[2020-06-03T13:22:20.504] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Property Saturation Initialization using 12 workers 
[2020-06-03T13:22:20.504] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Property Saturation Initialization took 0 ms 
[2020-06-03T13:22:20.504] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Reflexive Property Computation started 
[2020-06-03T13:22:20.509] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Reflexive Property Computation using 12 workers 
[2020-06-03T13:22:20.512] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Reflexive Property Computation took 8 ms 
[2020-06-03T13:22:20.513] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Object Property Hierarchy and Composition Computation started 
[2020-06-03T13:22:20.517] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Object Property Hierarchy and Composition Computation using 12 workers 
[2020-06-03T13:22:20.529] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Object Property Hierarchy and Composition Computation took 16 ms 
[2020-06-03T13:22:20.530] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Context Initialization started 
[2020-06-03T13:22:20.530] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Context Initialization using 12 workers 
[2020-06-03T13:22:20.531] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Context Initialization took 1 ms 
[2020-06-03T13:22:20.531] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Consistency Checking started 
[2020-06-03T13:22:20.533] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Consistency Checking using 12 workers 
[2020-06-03T13:22:20.539] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Consistency Checking took 7 ms 
[2020-06-03T13:22:20.539] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Class Taxonomy Computation started 
[2020-06-03T13:22:20.545] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.AbstractReasonerStage        Class Taxonomy Computation using 12 workers 
[2020-06-03T13:22:23.273] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.stages.LoggingStageExecutor         Class Taxonomy Computation took 2733 ms 
[2020-06-03T13:22:24.020] INFO  eventbus-1649                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":3}} 
[2020-06-03T13:22:27.349] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   reasoner-taxonomy-inferrer                                       <<< Classification and inferred taxonomy extraction [13.62 s] 
[2020-06-03T13:22:27.400] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   org.semanticweb.elk.reasoner.Reasoner                            ELK reasoner has shut down 
[2020-06-03T13:22:27.408] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   c.b.snowowl.snomed.reasoner.normalform.NormalFormGenerator       >>> Distribution normal form generation 
[2020-06-03T13:22:27.409] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   c.b.snowowl.snomed.reasoner.normalform.NormalFormGenerator       --- Initializing node graphs for types: [733928003, 774081006, 738774007, 733930001] 
[2020-06-03T13:22:34.058] INFO  eventbus-1593                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":2}} 
[2020-06-03T13:22:44.027] INFO  eventbus-1592                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":3}} 
[2020-06-03T13:22:54.040] INFO  eventbus-1651                                    request                                                          {"locales":[],"expand":{},"fields":[],"limit":50,"options":{"SORT_BY":[{"ascending":false,"field":"timestamp"}]},"operation":"browse","type":"CommitInfoSearchRequest","repositoryId":"valuesetStore","metrics":{"responseTime":2}} 
[2020-06-03T13:23:27.811] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   c.b.snowowl.snomed.reasoner.normalform.NormalFormGenerator       --- Use node graphs for hierarchy computation 
[2020-06-03T13:23:52.266] INFO  eventbus-1656                                    request                                                          {"type":"ServerInfoGetRequest","metrics":{"responseTime":0}} 
[2020-06-03T13:24:32.938] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   c.b.snowowl.snomed.reasoner.normalform.NormalFormGenerator       <<< Distribution normal form generation [2.092 min] 
[2020-06-03T13:24:32.947] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   index.snomedstore                                                Updated 1 classificationtask documents with bulk update with script 'completed' 
[2020-06-03T13:24:32.987] INFO  Worker-1: a7b515f7-200d-4f11-b4e8-cdad161726aa   index.jobs                                                       Updated 1 job documents with bulk update with script 'withDone' 

Cheers, Mark

mattecasu commented 4 years ago

Thank you Mark, we will try with the latest version then. I have two related questions:

Thank you for clarifying these features!

cmark commented 4 years ago

Hi @mattecasu,

What is the expected behaviour regarding inferred statements? Are stated and inferred statements disjoint sets, or should we expect the set of inferred statements to contain the stated ones too?

Please refer to the official SNOMED International documentations/papers regarding inferred statement generation: https://confluence.ihtsdotools.org/display/DOCOWL/2.5.+Generating+Necessary+Normal+Form+Relationships+from+the+OWL+Refsets https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2243264/pdf/procamiasymp00002-0666.pdf

Snow Owl generates the NNF from the available stated definitions during the normal form generation phase of the classification. A stated definition of a concept can become more specific and can result in more specified ISA relationships (the concept pushed down in the hierarchy to represent its correct placement), or other relationships can appear on the concept based on the supertypes defined via the ISA relationships.

Should we trigger the classification explicitly to produce inferred statements, or is it something internally scheduled/done at write time?

Snow Owl provides APIs to generate the inferred definitions of the concepts (aka classify the SNOMED CT content) and it does not schedule this periodically or generate the inferred definitions incrementally. The reason for this is that classification heavily depends on the authoring cycle a SNOMED CT Extension Authoring team uses. Some authors classify after each new concept, some have scheduled classification times, some classify only before the release, etc.. So Snow Owl cannot make any assumptions on when it would be a good time to classify, it relies on manual execution triggered by an external source (be it a human or a machine).

I hope this answers your questions.

Cheers, Mark

cristinacivili-work commented 4 years ago

Good morning Mark, unfortunately we are still experiencing issues with triggering a classification run.

When I start a new classification task on a branch (exactly as you suggested above), the classification is created and then stays in a SCHEDULED state for days. So far we haven't been able to see any of these scheduled tasks change status.

In the DEBUG logs I see a pattern repeating every five minutes where each message starts with [Classification task cleanup] that could be related, but no ERROR log is involved.

This is what I found in the logs:

`

Jul 13, 2020 @ 17:52:36.952 16:52:36.952 [Classification task cleanup] DEBUG org.elasticsearch.client.RestClient - request [POST https://<host>.amazonaws.com/snomedstore-classificationtask/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true] returned [HTTP/1.1 200 OK]

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection leased: [id: http-outgoing-50][route: {s}->https://<host>.amazonaws.com:443][total kept alive: 3; route allocated: 4 of 10; total allocated: 4 of 30]

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAuthCache - Re-using cached 'basic' auth scheme for https://<host>.amazonaws.com:443

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 240592] Connection allocated: CPoolProxy{http-outgoing-50 [ACTIVE]}

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][r:r][ACTIVE][r][NOT_HANDSHAKING][0][0][0]: Set timeout 0

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection request: [route: {s}->https://<host>.amazonaws.com:443][total kept alive: 4; route allocated: 4 of 10; total allocated: 4 of 30]

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 240592] Request connection for {s}->https://<host>.amazonaws.com:443

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAuthCache - No credentials for preemptive authentication

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.MainClientExec - [exchange: 240592] start execution

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.elasticsearch.client.RestClient - request [DELETE https://<host>.amazonaws.com/_search/scroll] returned [HTTP/1.1 200 OK]

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][r:r][ACTIVE][r][NOT_HANDSHAKING][0][0][0]: Set attribute http.nio.exchange-handler

Jul 13, 2020 @ 17:52:36.947 16:52:36.947 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][rw:r][ACTIVE][rw][NOT_HANDSHAKING][0][0][0]: Event set [w]

Jul 13, 2020 @ 17:52:36.944 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][rw:r][ACTIVE][rw][NOT_HANDSHAKING][0][0][0]: Event set [w]

Jul 13, 2020 @ 17:52:36.944 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 240591] Connection allocated: CPoolProxy{http-outgoing-50 [ACTIVE]}

Jul 13, 2020 @ 17:52:36.944 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection leased: [id: http-outgoing-50][route: {s}->https://<host>.amazonaws.com:443][total kept alive: 3; route allocated: 4 of 10; total allocated: 4 of 30]

Jul 13, 2020 @ 17:52:36.944 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][r:r][ACTIVE][r][NOT_HANDSHAKING][0][0][0]: Set attribute http.nio.exchange-handler

Jul 13, 2020 @ 17:52:36.944 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][r:r][ACTIVE][r][NOT_HANDSHAKING][0][0][0]: Set timeout 0

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection request: [route: {s}->https://<host>.amazonaws.com:443][total kept alive: 4; route allocated: 4 of 10; total allocated: 4 of 30]

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 240591] Request connection for {s}->https://<host>.amazonaws.com:443

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAuthCache - No credentials for preemptive authentication

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.elasticsearch.client.RestClient - request [POST https://<host>.amazonaws.com/snomedstore-classificationtask/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&scroll=60s&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true] returned [HTTP/1.1 200 OK]

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAuthCache - Re-using cached 'basic' auth scheme for https://<host>.amazonaws.com:443

Jul 13, 2020 @ 17:52:36.943 16:52:36.943 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.MainClientExec - [exchange: 240591] start execution

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][rw:r][ACTIVE][rw][NOT_HANDSHAKING][0][0][0]: Event set [w]

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection request: [route: {s}->https://<host>.amazonaws.com:443][total kept alive: 4; route allocated: 4 of 10; total allocated: 4 of 30]

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 240590] Request connection for {s}->https://<host>.amazonaws.com:443

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.MainClientExec - [exchange: 240590] start execution

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAuthCache - No credentials for preemptive authentication
Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][r:r][ACTIVE][r][NOT_HANDSHAKING][0][0][0]: Set timeout 0

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 240590] Connection allocated: CPoolProxy{http-outgoing-50 [ACTIVE]}

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection leased: [id: http-outgoing-50][route: {s}->https://<host>.amazonaws.com:443][total kept alive: 3; route allocated: 4 of 10; total allocated: 4 of 30]

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-50 <host>:42854<-><host>:443[ACTIVE][r:r][ACTIVE][r][NOT_HANDSHAKING][0][0][0]: Set attribute http.nio.exchange-handler

Jul 13, 2020 @ 17:52:36.939 16:52:36.939 [Classification task cleanup] DEBUG org.apache.http.client.protocol.RequestAuthCache - Re-using cached 'basic' auth scheme for https://<host>.amazonaws.com:443`
cmark commented 4 years ago

Hi @cristinacivili-work,

Could you please try to set the log level to TRACE to see if there are any additional log messages coming from the classification service?

Thanks, Mark

ivelina-yordanova commented 4 years ago

Hi @cmark,

I managed to run a classification job and it finished successfully. However, there is no documentation about how to actually apply the changes found after that. Looking at the code I gathered I have to update the job to SAVED state. Did that but for the last few hours now it's been stuck in SAVING_IN_PROGRESS.

Do you know what I can do to check if it actually do something or how long is it actually supposed to take for the saving, is this normal, seems unlikely. Also what exactly should the values of the update request be, I am having trouble figuring out the assigner one?

Thanks, Ivelina

ivelina-yordanova commented 4 years ago

Hi @cmark,

After experimenting with different values I managed to transition a job to SAVED state with:

this however resulted in loosing a whole lot of concepts and relationships and they are nowhere to be found now.

Can you please describe the whole complete process of classification L

Also this appears to maybe be related to the other issue I raised about the payload limit because what I see in the logs is:

Updated 1 branch documents with bulk update with script 'commit' [2020-09-11 15:58:51.094] [pool-11-thread-2] INFO index.snomedstore - Updated 35703 relationship documents with bulk update with script 'updateRevised' [2020-09-11 15:59:03.954] [pool-11-thread-3] INFO index.snomedstore - Updated 74586 concept documents with bulk update with script 'updateRevised' [2020-09-11 15:59:09.451] [I/O dispatcher 1] ERROR index.snomedstore - Failed bulk request org.elasticsearch.ElasticsearchStatusException: Unable to parse response body

followed by a whole bunch of : Caused by: org.elasticsearch.client.ResponseException: method [POST], host [https://vpc-es-snowowl-internal-dev-qqh5rgzgvfds7z2igbkkguecn4.eu-west-1.es.amazonaws.com], URI [/snomedstore-concept/_bulk?timeout=1m], status line [HTTP/1.1 413 Request Entity Too Large] {"Message":"Request size exceeded 10485760 bytes"}

and finally : [2020-09-11 15:59:19.412] [Worker-1: 9e70e628-d869-4fd4-a38d-8395604d7483] INFO repository.snomedStore - Changes have been successfully persisted to MAIN/2018-07-31/SNOMEDCT-UK/2019-06-01/SNOMEDCT-BABYLON@1599839851037. [2020-09-11 15:59:19.438] [Worker-1: 9e70e628-d869-4fd4-a38d-8395604d7483] INFO lock - Lock released for branch 'MAIN/2018-07-31/SNOMEDCT-UK/2019-06-01/SNOMEDCT-BABYLON' of repository 'snomedStore' (committing changes). [2020-09-11 15:59:20.456] [Worker-1: 9e70e628-d869-4fd4-a38d-8395604d7483] INFO index.snomedstore - Updated 1 classificationtask documents with bulk update with script 'saved' [2020-09-11 15:59:20.512] [Worker-1: 9e70e628-d869-4fd4-a38d-8395604d7483] INFO lock - Lock released for branch 'MAIN/2018-07-31/SNOMEDCT-UK/2019-06-01/SNOMEDCT-BABYLON' of repository 'snomedStore' (persisting ontology changes). [2020-09-11 15:59:20.552] [Worker-1: 9e70e628-d869-4fd4-a38d-8395604d7483] INFO index.jobs - Updated 1 job documents with bulk update with script 'withCompletionLevel' [2020-09-11 15:59:20.576] [Worker-1: 9e70e628-d869-4fd4-a38d-8395604d7483] INFO index.jobs - Updated 1 job documents with bulk update with script 'withDone'

which all seems like maybe the save would have been a success but the payload size limit did not allow the changes to be saved and instead the existing concepts got even messed up as well, but the whole process continued and ended misleadingly "successful"

Thanks

cmark commented 4 years ago

Hi @ivelina-yordanova,

this however resulted in loosing a whole lot of concepts and relationships and they are nowhere to be found now.

I'm sad to hear you have lost data after you have performed a save through the classification endpoint. This unfortunately originates from the same issue, Snow Owl does not yet fully support Elasticsearch clusters where the HTTP communication request size is limited to a certain amount. Snow Owl is currently limiting all save operations to 10Mb of bulk action items (or 10_000 bulk action items, whichever threshold reached first). This 10Mb threshold however is only honored by Elasticsearch BulkProcessor and not by other bulk requests (bulk update/delete operations). Those will limit themselves to 10_000 items. I have created a PR (#690) to allow users to configure these values in the snowowl.yml configuration file, which might help you solve the issue by reducing the values to 5_000 and 5 (MB) respectively.

Can you please describe the whole complete process of classification L running the job after done - review the changes save/apply the changes - how? where are those actually applied?

Process step-by-step is as follows:

  1. Initiate the classification through POST /classifications by selecting the target branch and the classifier to use.
  2. Wait until the classification is complete (poll the endpoint until you see a different status than SCHEDULED and/or RUNNING)
  3. Reviewing the changes can be done by retrieving information from the following endpoints:
    • GET /classifications/:classificationId/equivalent-concepts - it is recommended (but not required) to resolve those first, then reclassify and continue with the review using the new data
    • GET /classifications/:classificationId/relationship-changes - inferred (aka to be added)/redundant (aka to be removed) relationships to apply on concepts
  4. When review is complete and the classification result can be saved send a PUT /classifications/:classificationId request with the following JSON body to save the results
{
  "assigner": "extension", // there are two known open source implementation, default and extension (recommended for extensions, like yours), default uses the values configured in the request only (see next two values), while extension uses a more specific algorithm to assign namespace and module values to the newly created inferred relationships
  "module": "string", // this should be your module
  "namespace": "string", // this should be your namespace
  "status": "SAVED"
}
  1. Wait until save is complete, poll until status changes from SAVING_IN_PROGRESS to either SAVE_FAILED or SAVED.
  2. Classification is done and can be viewed on affected concepts and the inferred hierarchy

The classifier generates new inferred relationships based on the new hierarchy information generated by the selected reasoner (eg. ELK). The new relationships will alter the inferred hierarchy in a way to match the stated definition of the concepts. They will move to a more appropriate parent if needed and new inferred information will be added and so on to the affected concepts. This information then can be used in ECL and other queries for you analytics and data retrieval needs.

I hope this clarifies the implementation of the classification process.

Let me know if you have any further questions.

Cheers, Mark

cmark commented 2 years ago

Hi all,

I'm closing this due to inactivity. Feel free to reopen if you have any new information or need help.

Cheers, Mark