elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.19k stars 24.85k forks source link

[CI] RestClientSingleHostIntegTests testRequestResetAndAbort failing #102717

Open alex-spies opened 11 months ago

alex-spies commented 11 months ago

Test timed out, leading to abandoning the test suite.

Build scan: https://gradle-enterprise.elastic.co/s/yfwhltzif4ytg/tests/:client:rest:test/org.elasticsearch.client.RestClientSingleHostIntegTests/testRequestResetAndAbort

Reproduction line:

./gradlew :client:rest:test --tests "org.elasticsearch.client.RestClientSingleHostIntegTests" -Dtests.seed=16B9C0C0DDF7F175

Applicable branches: 7.17

Reproduces locally?: No

Failure history: https://es-delivery-stats.elastic.dev/app/dashboards#/view/dcec9e60-72ac-11ee-8f39-55975ded9e63?_g=(refreshInterval:(pause:!t,value:60000),time:(from:now-7d%2Fd,to:now))&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testRequestResetAndAbort'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.client.RestClientSingleHostIntegTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium))))

Failure excerpt:

java.lang.Exception: Test abandoned because suite timeout was reached.

  at __randomizedtesting.SeedInfo.seed([16B9C0C0DDF7F175]:0)
elasticsearchmachine commented 11 months ago

Pinging @elastic/clients-team (Team:Clients)

jonathan-buttner commented 11 months ago

Another failure here: https://gradle-enterprise.elastic.co/s/ci6zaoe5tzioy/tests/task/:client:rest:test/details/org.elasticsearch.client.RestClientSingleHostIntegTests?top-execution=1

alex-spies commented 9 months ago

One more: https://gradle-enterprise.elastic.co/s/tzf2oi3ebsrfm

kingherc commented 9 months ago

One more at elasticsearch / periodic / 7.17 / adoptopenjdk11 / java-matrix

cbuescher commented 6 months ago

And one more today: https://gradle-enterprise.elastic.co/s/ct5g6chkbbxuq/tests/task/:client:rest:test/details/org.elasticsearch.client.RestClientSingleHostIntegTests/testRequestResetAndAbort?top-execution=1

nielsbauman commented 5 months ago

Another one: https://gradle-enterprise.elastic.co/s/mwv7uexdf334u/tests/task/:client:rest:test/details/org.elasticsearch.client.RestClientSingleHostIntegTests/testRequestResetAndAbort

pxsalehi commented 5 months ago

On main: https://gradle-enterprise.elastic.co/s/tvwhltklypkty

swallez commented 1 week ago

Initial investigation

Inspection of the failure reports show the same problem: the test times out because the reactor (event loop) of the http client has crashed.

  Jun 04, 2024 2:13:46 PM org.apache.http.impl.nio.client.InternalHttpAsyncClient run   
  SEVERE: I/O reactor terminated abnormally 
  java.io.InterruptedIOException    
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.doShutdown(AbstractMultiworkerIOReactor.java:465)  
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:377) 
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)  
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) 
    at java.base/java.lang.Thread.run(Thread.java:1570)

The timeout happens always at the last call to future.get() (link): since the reactor is stopped, the future never completes.

This crash is flaky, and the logs show no information explaining it (this kind of reactor issue is notoriously hard to debug as there are no logs when it crashes).

swallez commented 1 week ago

Cannot reproduce. A good next step would be to update the http client library to the latest 4.x version.

Note that this is unrelated to Elasticsearch client libraries, as the failing test exercises the http client library in isolation, with no Elastic wrapper involved.

swallez commented 1 week ago

Additional hint: a frequent cause of I/O reactor crashing is when an OOME happens in this loop. This usually happens when a response callback allocates memory to buffer the response body, and the Java client has a protection against this. There are no such callbacks in this test, but still, if an OOME happens in other parts of the event loop it can cause it to crash.

I haven't seen any memory-related information in the failing tests logs. Would there be a way to log OOME's that occur during a test run?

elasticsearchmachine commented 1 day ago

Pinging @elastic/es-core-infra (Team:Core/Infra)