[CI] FullClusterRestartIT#testWatcher failures

romseygeek commented 4 years ago

This has been failing a couple of times a day since it was re-enabled in #48000

Failures fall into two types; on 7.x and 7.5:

org.elasticsearch.client.ResponseException: method [GET], host [http://127.0.0.1:35174], URI [/_cluster/health/.watches,bwc_watch_index,.watcher-history*?wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&timeout=30s&wait_for_status=yellow], status line [HTTP/1.1 408 Request Timeout] {"cluster_name":"v6.7.0","status":"red","timed_out":true,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":2,"active_shards":4,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}   at __randomizedtesting.SeedInfo.seed([1CA518BE73754281:7148EB738314D577]:0)     at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)     at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261)  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)  at org.elasticsearch.xpack.restart.FullClusterRestartIT.waitForYellow(FullClusterRestartIT.java:567)    at org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcher(FullClusterRestartIT.java:160)

eg https://build-stats.elastic.co/app/kibana#/doc/b646ed00-7efc-11e8-bf69-63c8ef516157/build-*/t?id=20191023094703-5F65A9C2&_g=()

and on master:

java.lang.AssertionError: 
Expected: <2>
     but: was <3>
    at __randomizedtesting.SeedInfo.seed([16B952622AA67F39:7B54A1AFDAC7E8CF]:0)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.junit.Assert.assertThat(Assert.java:923)
    at org.elasticsearch.xpack.restart.FullClusterRestartIT.assertBasicWatchInteractions(FullClusterRestartIT.java:351)
    at org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcher(FullClusterRestartIT.java:176)

eg https://build-stats.elastic.co/app/kibana#/doc/b646ed00-7efc-11e8-bf69-63c8ef516157/build-*/t?id=20191022054329-BF6E5EA6&_g=()

elasticmachine commented 4 years ago

Pinging @elastic/es-core-features (:Core/Features/Watcher)

jakelandis commented 4 years ago

Increased timeout on for yellow state on https://github.com/elastic/elasticsearch/pull/48434 (will backport) and will look into the assertion error : https://gradle-enterprise.elastic.co/s/kcjrg2hoa7zqe/

pugnascotia commented 4 years ago

Another failure: https://gradle-enterprise.elastic.co/s/rvzlp65switqy/tests/qapgjxqnlfyjk-juyey3tkpelm6

tlrx commented 4 years ago

Another failure: https://gradle-enterprise.elastic.co/s/e6ambkakrlg3y/tests/onumgmisjues4-juyey3tkpelm6?openStackTraces=WzBd

hendrikmuhs commented 4 years ago

another: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+bwc/1837/console

I think the last failures had the 60s timeout in.

jakelandis commented 4 years ago

This test has been re-muted across branches and the timeout reduced back to the original 30s.