Closed martijnvg closed 4 years ago
Pinging @elastic/es-core-features (:Core/Features/Watcher)
Reported by @dakrone in #33326:
09:55:26
09:55:26 org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityClientYamlTestSuiteIT > test {yaml=watcher/usage/10_basic/Test watcher usage stats output} FAILED
09:55:26 java.lang.AssertionError: Failure at [watcher/usage/10_basic:48]: field [watcher.count.active] is not greater than [$watch_count_active]
09:55:26 Expected: a value greater than <1>
09:55:26 but: <1> was equal to <1>
09:55:26 at __randomizedtesting.SeedInfo.seed([3AA5E1A0040CF825:B2F1DE7AAAF095DD]:0)
09:55:26 at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:405)
09:55:26 at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:382)
09:55:26 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:55:26 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
I was unable to reproduce this on the 7.x branch.
Failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+matrix-java-periodic/ES_RUNTIME_JAVA=zulu11,nodes=general-purpose/538/consoleFull https://gradle-enterprise.elastic.co/s/xxhruxec3jeji
This (^) is another test that failed because incorrect stats counts are reported.
I suspect the main cause of these failures is that watcher, is not fully started on all shard instances that it serves watches from. More specifically the WatcherIndexingListener
maybe inactive for a specific shard. We change the tests to ensure that watcher is fully started, but on the other hand we can change the put watch api to check whether the WatcherIndexingListener
is active prior to indexing. If it not ready wait similar to the timeout on index request (waiting for enough shard copies to be ready prior to indexing)?
SmokeTestWatcherTestSuiteIT.testMonitorClusterHealth
has failed twice again today. I assume due to the nature of the underlying cause that muting isn't practical.
I want to see how these tests respond to #52627.
Otherwise I think we should investigate changing the watcher put and delete APIs to wait for the watch to be added to the trigger service before returning a response. Tests assume that this always happens, but that is not the case. In the meantime specific tests can be muted.
@martijnvg it looks like it just failed https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos-7&&immutable/611/console
There was a failure today that was related to https://github.com/elastic/elasticsearch/issues/33326, which looks like this issue replaces.
There are no instances of these failures since May 11 (There are a couple SSL failures in a FIPs container ... but that is not what this issue is about) This corresponds with #56556 was introduced to help address issues like this.
SmokeTestWatcherTestSuiteIT Failure:
The failure matches with recent failures reported in #32299. The #51466 fix didn't make this test stop from failing.
The failure has failed a few times now&_a=(columns:!(_source),index:e58bf320-7efd-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'class:SmokeTestWatcherWithSecurityIT+OR+class:SmokeTestWatcherTestSuiteIT'),sort:!(time,desc))) and needs to be re-investigated.
Build failures:
WatchAckTests.testAckAllActions failure:
Build log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+part2/3568/console
Build scan: https://gradle-enterprise.elastic.co/s/ua3yon2njbyja
Failure:
Reproduce with:
Can't reproduce locally.