Open alex-spies opened 11 months ago
Pinging @elastic/clients-team (Team:Clients)
Initial investigation
Inspection of the failure reports show the same problem: the test times out because the reactor (event loop) of the http client has crashed.
Jun 04, 2024 2:13:46 PM org.apache.http.impl.nio.client.InternalHttpAsyncClient run
SEVERE: I/O reactor terminated abnormally
java.io.InterruptedIOException
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.doShutdown(AbstractMultiworkerIOReactor.java:465)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:377)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.base/java.lang.Thread.run(Thread.java:1570)
The timeout happens always at the last call to future.get()
(link): since the reactor is stopped, the future never completes.
This crash is flaky, and the logs show no information explaining it (this kind of reactor issue is notoriously hard to debug as there are no logs when it crashes).
Cannot reproduce. A good next step would be to update the http client library to the latest 4.x version.
Note that this is unrelated to Elasticsearch client libraries, as the failing test exercises the http client library in isolation, with no Elastic wrapper involved.
Additional hint: a frequent cause of I/O reactor crashing is when an OOME happens in this loop. This usually happens when a response callback allocates memory to buffer the response body, and the Java client has a protection against this. There are no such callbacks in this test, but still, if an OOME happens in other parts of the event loop it can cause it to crash.
I haven't seen any memory-related information in the failing tests logs. Would there be a way to log OOME's that occur during a test run?
Pinging @elastic/es-core-infra (Team:Core/Infra)
Test timed out, leading to abandoning the test suite.
Build scan: https://gradle-enterprise.elastic.co/s/yfwhltzif4ytg/tests/:client:rest:test/org.elasticsearch.client.RestClientSingleHostIntegTests/testRequestResetAndAbort
Reproduction line:
Applicable branches: 7.17
Reproduces locally?: No
Failure history: https://es-delivery-stats.elastic.dev/app/dashboards#/view/dcec9e60-72ac-11ee-8f39-55975ded9e63?_g=(refreshInterval:(pause:!t,value:60000),time:(from:now-7d%2Fd,to:now))&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testRequestResetAndAbort'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.client.RestClientSingleHostIntegTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium))))
Failure excerpt: