elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.71k stars 24.66k forks source link

[CI] XPackRestIT failing on various YAML tests #75069

Closed romseygeek closed 6 months ago

romseygeek commented 3 years ago
got unexpected warning header [ 
    299 Elasticsearch-7.15.0-SNAPSHOT-f97d65771f221acca4b0a9a9441844a4392028fd "this request accesses system indices: [.ml-config], but in a future major version, direct access to system indices will be prevented by default"    
]

Build scan: https://gradle-enterprise.elastic.co/s/svd6gy2xzpfn4/tests/:x-pack:plugin:yamlRestTest/org.elasticsearch.xpack.test.rest.XPackRestIT/test%20%7Bp0=data_stream%2F140_data_stream_aliases%2FCreate%20data%20stream%20alias%7D

Reproduction line: ./gradlew ':x-pack:plugin:yamlRestTest' --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=data_stream/140_data_stream_aliases/Create data stream alias}" -Dtests.seed=9F9D630801B00349 -Dtests.locale=it-IT -Dtests.timezone=Asia/Ashkhabad -Druntime.java=8

Applicable branches: 7.x, master

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.test.rest.XPackRestIT&tests.test=test%20%7Bp0%3Ddata_stream/140_data_stream_aliases/Create%20data%20stream%20alias%7D

Failure excerpt:

java.lang.AssertionError: Failure at [data_stream/140_data_stream_aliases:63]: got unexpected warning header [
    299 Elasticsearch-7.15.0-SNAPSHOT-f97d65771f221acca4b0a9a9441844a4392028fd "this request accesses system indices: [.ml-config], but in a future major version, direct access to system indices will be prevented by default"
]

  at __randomizedtesting.SeedInfo.seed([9F9D630801B00349:17C95CD2AF4C6EB1]:0)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:462)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:439)
  at sun.reflect.GeneratedMethodAccessor15.invoke(null:-1)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at java.lang.Thread.run(Thread.java:748)

  Caused by: java.lang.AssertionError: got unexpected warning header [
    299 Elasticsearch-7.15.0-SNAPSHOT-f97d65771f221acca4b0a9a9441844a4392028fd "this request accesses system indices: [.ml-config], but in a future major version, direct access to system indices will be prevented by default"
  ]

    at org.junit.Assert.fail(Assert.java:88)
    at org.elasticsearch.test.rest.yaml.section.DoSection.checkWarningHeaders(DoSection.java:476)
    at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:350)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:455)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:439)
    at sun.reflect.GeneratedMethodAccessor15.invoke(null:-1)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
    at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
    at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at java.lang.Thread.run(Thread.java:748)
elasticmachine commented 3 years ago

Pinging @elastic/ml-core (Team:ML)

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

droberts195 commented 3 years ago

I think #73734 would fix this. As an interim fix we should probably just ignore this warning.

romseygeek commented 3 years ago

A related failure in docs yaml tests: https://gradle-enterprise.elastic.co/s/g2vrkxovl2ws2

[2564-07-08T15:10:11,580][INFO ][o.e.s.DocsClientYamlTestSuiteIT] [test] [yaml=reference/query-dsl/exists-query/line_56] before test |  
-- | --
  | ก.ค. ๐๘, ๒๕๖๔ ๓:๑๐:๑๑ หลังเที่ยง org.elasticsearch.client.RestClient logResponse |  
  | WARNING: request [GET http://[::1]:43489/_search?error_trace=true] returned 1 warnings: [299 Elasticsearch-7.15.0-SNAPSHOT-a049585f298d10ed58c64ecac6902df5c4b86280 "this request accesses system indices: [.async-search], but in a future major version, direct access to system indices will be prevented by default"] |  
  | [
droberts195 commented 3 years ago

I think this is an evolution of #74141. After fixing some of the race conditions between ML job open, close and kill the races move on to code that runs later.

In the failure from this issue's description the relevant bit of the server side log is this:

[2021-07-07T14:19:35,848][INFO ][o.e.x.m.MachineLearning  ] [yamlRestTest-0] Starting machine learning feature reset
[2021-07-07T14:19:35,850][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [yamlRestTest-0] [set-upgrade-mode-job] Loading model snapshot [N/A], job latest_record_timestamp [N/A]
[2021-07-07T14:19:35,919][INFO ][o.e.x.m.a.TransportKillProcessAction] [yamlRestTest-0] [set-upgrade-mode-job] Killing job
[2021-07-07T14:19:35,919][TRACE][o.e.x.m.j.p.a.AutodetectProcessManager] [yamlRestTest-0] [set-upgrade-mode-job] Killing process: awaitCompletion = [true]; reason = [kill process (api)]
[2021-07-07T14:19:35,975][INFO ][o.e.c.m.MetadataDeleteIndexService] [yamlRestTest-0] [.ml-state-000001/wlkDnNKlQrqyEX_SoHKDcg] deleting index
[2021-07-07T14:19:35,975][INFO ][o.e.c.m.MetadataDeleteIndexService] [yamlRestTest-0] [.ml-notifications-000002/UdDUIn_vSGW4XpL0OXCb-w] deleting index
[2021-07-07T14:19:35,975][INFO ][o.e.c.m.MetadataDeleteIndexService] [yamlRestTest-0] [.ml-config/-HtKKmzATrqhExujwSN9BQ] deleting index
[2021-07-07T14:19:35,975][INFO ][o.e.c.m.MetadataDeleteIndexService] [yamlRestTest-0] [.ml-anomalies-shared/ZP7tVtBfQCeUXamDP4Tj7w] deleting index
[2021-07-07T14:19:35,976][INFO ][o.e.c.m.MetadataDeleteIndexService] [yamlRestTest-0] [.ml-annotations-6/lpA4DM6aRauiVpREJX9vhw] deleting index
[2021-07-07T14:19:36,130][INFO ][o.e.c.m.MetadataDeleteIndexService] [yamlRestTest-0] [airline-data/Zqw03jwRQFmmK-DuatKkGg] deleting index
[2021-07-07T14:19:36,266][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [yamlRestTest-0] adding index lifecycle policy [.fleet-actions-results-ilm-policy]
[2021-07-07T14:19:36,318][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [yamlRestTest-0] adding index lifecycle policy [synthetics]
[2021-07-07T14:19:36,368][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [yamlRestTest-0] adding index lifecycle policy [.deprecation-indexing-ilm-policy]
[2021-07-07T14:19:36,392][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [yamlRestTest-0] Closing job [set-upgrade-mode-job], because [job is already closing]
[2021-07-07T14:19:36,396][WARN ][o.e.x.m.j.p.a.o.AutodetectResultProcessor] [yamlRestTest-0] [set-upgrade-mode-job] some results not processed due to the process being killed
[2021-07-07T14:19:36,397][INFO ][o.e.x.m.j.p.a.AutodetectCommunicator] [yamlRestTest-0] [set-upgrade-mode-job] job closed
[2021-07-07T14:19:36,407][INFO ][o.e.c.m.MetadataCreateIndexService] [yamlRestTest-0] [.ml-config] creating index, cause [auto(update api)], templates [], shards [1]/[1]
[2021-07-07T14:19:36,408][INFO ][o.e.c.r.a.AllocationService] [yamlRestTest-0] updating number_of_replicas to [0] for indices [.ml-config]
[2021-07-07T14:19:36,472][INFO ][o.e.c.m.MetadataCreateIndexService] [yamlRestTest-0] [.ml-annotations-6] creating index, cause [api], templates [], shards [1]/[1]
[2021-07-07T14:19:36,473][INFO ][o.e.c.r.a.AllocationService] [yamlRestTest-0] updating number_of_replicas to [0] for indices [.ml-annotations-6]

Or, in plain English:

  1. Job is opening at the time the test ends
  2. Feature cleanup starts at the end of the test
  3. Job gets killed before its C++ process has started
  4. Indices get deleted
  5. Job's C++ process starts and we realise that it shouldn't be running because it was requested to be killed before it started
  6. We shut down the process again immediately
  7. Because of the way we shut down the process it tries to update documents in indices, causing these indices to be recreated after they've been deleted

75113 should fix this by changing the way we shut down the process in step 7 so that it should be free of side-effects.

droberts195 commented 3 years ago

Another one just happened in a docs test: https://gradle-enterprise.elastic.co/s/q6lylbs4wg3we

However, this one is for the .async-search index, not .ml-config, and I don't think ML is responsible for this one.

The relevant bit of the server-side log is:

[2021-07-08T13:14:21,548][INFO ][o.e.c.m.MetadataCreateIndexService] [node-0] [.async-search] creating index, cause [api], templates [], shards [1]/[0]
[2021-07-08T13:14:21,636][INFO ][o.e.c.m.MetadataDeleteIndexService] [node-0] [library/wnvB3FnqThqM1VDYR62u4A] deleting index
[2021-07-08T13:14:21,637][INFO ][o.e.c.m.MetadataDeleteIndexService] [node-0] [.async-search/-C8hXOI_QOuU488GHnhuKQ] deleting index
[2021-07-08T13:14:21,704][INFO ][o.e.c.m.MetadataCreateIndexService] [node-0] [.async-search] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]
[2021-07-08T13:14:21,819][INFO ][o.e.c.m.MetadataMappingService] [node-0] [.async-search/mkUDT9-hRDqacmjcBBo8aQ] create_mapping [_doc]
[2021-07-08T13:14:22,008][INFO ][o.e.c.m.MetadataCreateIndexService] [node-0] [sales] creating index, cause [api], templates [], shards [1]/[1]
[2021-07-08T13:14:22,180][INFO ][o.e.c.m.MetadataDeleteIndexService] [node-0] [.async-search/mkUDT9-hRDqacmjcBBo8aQ] deleting index

So it looks like the .async-search index is being auto-created after it's been deleted after the previous test, then gets deleted again by the "delete all indices" cleanup code, yet this triggers a deprecation warning that fails the test.

droberts195 commented 3 years ago

This is another one that failed due to .async-search being a system index: https://gradle-enterprise.elastic.co/s/udfnzxfgyd37k

In this case the test that failed was reference/query-dsl/match-query/line_219. However, if you look in the full console log you can see that the test that ran immediately before was reference/sql/endpoints/rest/line_576. That was added recently in #74975 and uses async search, so that is what caused the .async-search index to be created at an unexpected moment. So this explains why these have started failing recently.

@williamrandolph do you think you would be able to merge #73734 soon with all the nice-to-haves deferred to a followup? I think that would be good enough for the docs tests. The only other solution would be to disable snippet testing for all the snippets added in #74975 that use async search.

/cc @jrodewig

cbuescher commented 3 years ago

I see a lot of failures in yaml tests this this "unexpected warning header" today, its one of the biggest sources of failures at the moment (although the overall frequency is bareable), so I'm +1 on gettin #73734 in as soon as @williamrandolph thinks its possible to get the noise level down.

jrodewig commented 3 years ago

Thanks for the ping here. #73734 is a better long-term fix, but I've merged #75237 to mute snippet tests and reduce noise in the meantime.

williamrandolph commented 3 years ago

@droberts195 Thanks for the ping. I will bring PR #73734 up to date and merge it as soon as I can.

dakrone commented 2 years ago

@williamrandolph are you still planning to merge #73734 to address this?

williamrandolph commented 2 years ago

Yes, now that we've finished our features for 8.0, I need to take another pass at that PR. Thanks for the reminder.

elasticsearchmachine commented 11 months ago

Pinging @elastic/es-data-management (Team:Data Management)

joegallo commented 6 months ago

I talked to @williamrandolph and between https://github.com/elastic/elasticsearch/pull/94130 and https://github.com/elastic/elasticsearch/pull/94191 we think this one's probably closable. I put up #106140 to revert https://github.com/elastic/elasticsearch/pull/75237, and once that's merged I'm going to close this ticket.