IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

Timeout error during import of RF2 snapshot file #640

Open mitochon opened 1 day ago

mitochon commented 1 day ago

Hi .. I'm trying to import an RF2 snapshot file using the docker container with the most recent branch. Followed the instructions and the import runs for around 45 min before hitting this error. Would appreciate any insights on how to get around this issue.

snowstorm      | 2024-11-18T08:54:13.554Z  INFO 1 --- [pool-3-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 5000 QueryConcepts
snowstorm      | 2024-11-18T08:54:17.135Z  INFO 1 --- [pool-3-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 5000 QueryConcepts
snowstorm      | 2024-11-18T08:54:31.030Z  INFO 1 --- [pool-3-thread-1] i.k.elasticvc.api.ComponentService       : Saving batch of 5000 QueryConcepts
elasticsearch  | {"@timestamp":"2024-11-18T08:55:37.054Z", "log.level": "WARN", "message":"Received response for a request that has timed out, sent [56.3s/56382ms] ago, timed out [39.8s/39801ms] ago, action [indices:monitor/stats[n]], node [{snowstorm}{Et_3sBQnRmyOdt4ovMAwUw}{XiibJYQDTZ6I3VCVd7qqig}{snowstorm}{172.19.0.2}{172.19.0.2:9300}{cdfhilmrstw}{8.11.1}{7000099-8500003}{xpack.installed=true, ml.config_version=11.0.0, ml.max_jvm_size=4294967296, ml.allocated_processors_double=12.0, ml.allocated_processors=12, ml.machine_memory=8220098560, transform.config_version=10.0.0}], id [273667]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[snowstorm][management][T#5]","log.logger":"org.elasticsearch.transport.TransportService","elasticsearch.cluster.uuid":"VcuUYBmZSAupIYRdQFbUwg","elasticsearch.node.id":"Et_3sBQnRmyOdt4ovMAwUw","elasticsearch.node.name":"snowstorm","elasticsearch.cluster.name":"snowstorm-cluster"}
elasticsearch  | {"@timestamp":"2024-11-18T08:55:42.259Z", "log.level": "WARN", "message":"failed to retrieve shard stats from node [Et_3sBQnRmyOdt4ovMAwUw]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[snowstorm][management][T#1]","log.logger":"org.elasticsearch.cluster.InternalClusterInfoService","elasticsearch.cluster.uuid":"VcuUYBmZSAupIYRdQFbUwg","elasticsearch.node.id":"Et_3sBQnRmyOdt4ovMAwUw","elasticsearch.node.name":"snowstorm","elasticsearch.cluster.name":"snowstorm-cluster","error.type":"org.elasticsearch.transport.ReceiveTimeoutTransportException","error.message":"[snowstorm][172.19.0.2:9300][indices:monitor/stats[n]] request_id [273667] timed out after [16581ms]","error.stack_trace":"org.elasticsearch.transport.ReceiveTimeoutTransportException: [snowstorm][172.19.0.2:9300][indices:monitor/stats[n]] request_id [273667] timed out after [16581ms]\n"}
snowstorm      | 2024-11-18T08:55:39.947Z ERROR 1 --- [pool-3-thread-1] o.s.s.core.rf2.rf2import.ImportService   : Failed RF2 SNAPSHOT import on branch MAIN. ID 1aa25343-f152-42c2-aaa6-d3787366ece4
snowstorm      | 
snowstorm      | org.springframework.dao.DataAccessResourceFailureException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchExceptionTranslator.translateExceptionIfPossible(ElasticsearchExceptionTranslator.java:110)
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchExceptionTranslator.translateException(ElasticsearchExceptionTranslator.java:64)
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchTemplate.execute(ElasticsearchTemplate.java:635)
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchTemplate.doBulkOperation(ElasticsearchTemplate.java:296)
snowstorm      |        at org.springframework.data.elasticsearch.core.AbstractElasticsearchTemplate.bulkOperation(AbstractElasticsearchTemplate.java:359)
snowstorm      |        at org.springframework.data.elasticsearch.core.AbstractElasticsearchTemplate.bulkIndex(AbstractElasticsearchTemplate.java:343)
snowstorm      |        at org.springframework.data.elasticsearch.core.DocumentOperations.bulkIndex(DocumentOperations.java:185)
snowstorm      |        at org.springframework.data.elasticsearch.core.AbstractElasticsearchTemplate.save(AbstractElasticsearchTemplate.java:236)
snowstorm      |        at org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository.lambda$saveAll$7(SimpleElasticsearchRepository.java:215)
snowstorm      |        at org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository.executeAndRefresh(SimpleElasticsearchRepository.java:449)
snowstorm      |        at org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository.saveAll(SimpleElasticsearchRepository.java:215)
snowstorm      |        at jdk.internal.reflect.GeneratedMethodAccessor136.invoke(Unknown Source)
snowstorm      |        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
snowstorm      |        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
snowstorm      |        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:354)
snowstorm      |        at org.springframework.data.repository.core.support.RepositoryMethodInvoker$RepositoryFragmentMethodInvoker.lambda$new$0(RepositoryMethodInvoker.java:277)
snowstorm      |        at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:170)
snowstorm      |        at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:158)
snowstorm      |        at org.springframework.data.repository.core.support.RepositoryComposition$RepositoryFragments.invoke(RepositoryComposition.java:516)
snowstorm      |        at org.springframework.data.repository.core.support.RepositoryComposition.invoke(RepositoryComposition.java:285)
snowstorm      |        at org.springframework.data.repository.core.support.RepositoryFactorySupport$ImplementationMethodExecutionInterceptor.invoke(RepositoryFactorySupport.java:628)
snowstorm      |        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
snowstorm      |        at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:168)
snowstorm      |        at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:143)
snowstorm      |        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
snowstorm      |        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
snowstorm      |        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
snowstorm      |        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:223)
snowstorm      |        at jdk.proxy2/jdk.proxy2.$Proxy131.saveAll(Unknown Source)
snowstorm      |        at io.kaicode.elasticvc.api.ComponentService.doSaveBatchComponents(ComponentService.java:93)
snowstorm      |        at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.doSaveBatch(SemanticIndexUpdateService.java:727)
snowstorm      |        at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.updateSemanticIndex(SemanticIndexUpdateService.java:421)
snowstorm      |        at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.updateStatedAndInferredSemanticIndex(SemanticIndexUpdateService.java:131)
snowstorm      |        at org.snomed.snowstorm.core.data.services.SemanticIndexUpdateService.preCommitCompletion(SemanticIndexUpdateService.java:95)
snowstorm      |        at io.kaicode.elasticvc.api.BranchService.completeCommit(BranchService.java:419)
snowstorm      |        at io.kaicode.elasticvc.domain.Commit.close(Commit.java:61)
snowstorm      |        at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.completeImportCommit(ImportComponentFactoryImpl.java:231)
snowstorm      |        at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.loadingComponentsCompleted(ImportComponentFactoryImpl.java:220)
snowstorm      |        at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:251)
snowstorm      |        at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:203)
snowstorm      |        at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadSnapshotReleaseFiles(ReleaseImporter.java:46)
snowstorm      |        at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadSnapshotReleaseFiles(ReleaseImporter.java:80)
snowstorm      |        at org.snomed.snowstorm.core.rf2.rf2import.ImportService.snapshotImport(ImportService.java:232)
snowstorm      |        at org.snomed.snowstorm.core.rf2.rf2import.ImportService.importFiles(ImportService.java:179)
snowstorm      |        at org.snomed.snowstorm.core.rf2.rf2import.ImportService.importArchive(ImportService.java:132)
snowstorm      |        at org.snomed.snowstorm.core.rf2.rf2import.ImportService.lambda$importArchiveAsync$1(ImportService.java:280)
snowstorm      |        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
snowstorm      |        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
snowstorm      |        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
snowstorm      |        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
snowstorm      |        at java.base/java.lang.Thread.run(Thread.java:840)
snowstorm      | Caused by: java.lang.RuntimeException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchExceptionTranslator.translateException(ElasticsearchExceptionTranslator.java:63)
snowstorm      |        ... 49 common frames omitted
snowstorm      | Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
snowstorm      |        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:915)
snowstorm      |        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:300)
snowstorm      |        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
snowstorm      |        at co.elastic.clients.transport.rest_client.RestClientHttpClient.performRequest(RestClientHttpClient.java:91)
snowstorm      |        at co.elastic.clients.transport.ElasticsearchTransportBase.performRequest(ElasticsearchTransportBase.java:144)
snowstorm      |        at co.elastic.clients.elasticsearch.ElasticsearchClient.bulk(ElasticsearchClient.java:337)
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchTemplate.lambda$doBulkOperation$11(ElasticsearchTemplate.java:296)
snowstorm      |        at org.springframework.data.elasticsearch.client.elc.ElasticsearchTemplate.execute(ElasticsearchTemplate.java:633)
snowstorm      |        ... 48 common frames omitted
snowstorm      | Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
snowstorm      |        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
snowstorm      |        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98)
snowstorm      |        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40)
snowstorm      |        at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
snowstorm      |        at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
snowstorm      |        at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:506)
snowstorm      |        at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
snowstorm      |        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
snowstorm      |        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
snowstorm      |        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
snowstorm      |        ... 1 common frames omitted
kaicode commented 1 day ago

It looks like Elasticsearch stopped responding which caused this timeout. Is there any additional information in the Elasticsearch log?

kaicode commented 1 day ago

This may happen if you run out of disk space for example but the log of the Elasticsearch container should give more detail.

mitochon commented 1 day ago

I checked the /usr/share/elasticsearch/logs directory in the ES container and didn't see any glaring issues or error. Though I noticed there is some garbage collection happening that took 28 secs? Not sure if that causes the timeout (between 08:53:57-> 08:54:25)

[2024-11-18T08:53:43.528+0000][74][info ][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 5123885786 ns, Reaching safepoint: 726424 ns, Cleanup: 1927892
 ns, At safepoint: 38841428 ns, Total: 41495744 ns
[2024-11-18T08:53:47.968+0000][74][info ][safepoint   ] Safepoint "Cleanup", Time since last: 4435318850 ns, Reaching safepoint: 4062958 ns, Cleanup: 1198277 ns, At safepo
int: 149499 ns, Total: 5410734 ns
[2024-11-18T08:53:50.365+0000][74][info ][safepoint   ] Safepoint "Cleanup", Time since last: 2400624616 ns, Reaching safepoint: 883206 ns, Cleanup: 1359351 ns, At safepoi
nt: 252453 ns, Total: 2495010 ns
[2024-11-18T08:53:57.753+0000][74][info ][safepoint   ] Safepoint "Cleanup", Time since last: 7356859076 ns, Reaching safepoint: 15672975 ns, Cleanup: 8624944 ns, At safep
oint: 3523816 ns, Total: 27821735 ns
[2024-11-18T08:54:25.609+0000][74][info ][gc,start    ] GC(324) Pause Young (Normal) (G1 Evacuation Pause)
[2024-11-18T08:54:25.617+0000][74][info ][gc,task     ] GC(324) Using 10 workers of 10 for evacuation
[2024-11-18T08:54:25.617+0000][74][debug][gc,age      ] GC(324) Desired survivor size 161480704 bytes, new threshold 15 (max threshold 15)
[2024-11-18T08:54:25.719+0000][74][info ][gc,phases   ] GC(324)   Pre Evacuate Collection Set: 3.1ms
[2024-11-18T08:54:25.720+0000][74][info ][gc,phases   ] GC(324)   Merge Heap Roots: 1.6ms
[2024-11-18T08:54:25.720+0000][74][info ][gc,phases   ] GC(324)   Evacuate Collection Set: 39.9ms
[2024-11-18T08:54:25.720+0000][74][info ][gc,phases   ] GC(324)   Post Evacuate Collection Set: 54.5ms
[2024-11-18T08:54:25.720+0000][74][info ][gc,phases   ] GC(324)   Other: 3.5ms
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) Age table with threshold 15 (max threshold 15)
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   1:   19424744 bytes,   19424744 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   2:    1350000 bytes,   20774744 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   3:      42280 bytes,   20817024 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   4:      43792 bytes,   20860816 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   5:      22944 bytes,   20883760 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   6:      16688 bytes,   20900448 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   7:      30856 bytes,   20931304 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   8:       1216 bytes,   20932520 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age   9:     429888 bytes,   21362408 total
[2024-11-18T08:54:25.720+0000][74][trace][gc,age      ] GC(324) - age  10:      13376 bytes,   21375784 total
[2024-11-18T08:54:25.721+0000][74][info ][gc,heap     ] GC(324) Eden regions: 607->0(608)
[2024-11-18T08:54:25.721+0000][74][info ][gc,heap     ] GC(324) Survivor regions: 7->6(77)
[2024-11-18T08:54:25.721+0000][74][info ][gc,heap     ] GC(324) Old regions: 62->62
[2024-11-18T08:54:25.721+0000][74][info ][gc,heap     ] GC(324) Humongous regions: 6->6
[2024-11-18T08:54:25.724+0000][74][info ][gc,metaspace] GC(324) Metaspace: 138745K(142144K)->138745K(142144K) NonClass: 120581K(122368K)->120581K(122368K) Class: 18164K(19
776K)->18164K(19776K)
[2024-11-18T08:54:25.725+0000][74][info ][gc          ] GC(324) Pause Young (Normal) (G1 Evacuation Pause) 2725M->293M(4096M) 117.785ms
[2024-11-18T08:54:25.725+0000][74][info ][gc,cpu      ] GC(324) User=0.29s Sys=0.05s Real=0.12s
[2024-11-18T08:54:25.727+0000][74][info ][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 27849747691 ns, Reaching safepoint: 2012623 ns, Cleanup: 13054
82 ns, At safepoint: 128221602 ns, Total: 131539707 ns