Closed kobelb closed 6 years ago
I have already put some time investigating this, and will write down my findings here.
This flaky failure is not directly related to this test, but affects all tests, that try to "Open" a saved search in Discover. For some reason it seems, that this test fails way more often than the others that open a saved search, but I have already had test failures for other Discovery tests, that need to open a saved search (but way less often).
The actual issue in all those cases is, that the testing framework clicks the "Open" button on the screen, but the panel to select a saved search actually never opens. If you look at the screenshots and HTML you can see, that the panel is still in it's closed state. But the Open button itself is in it's active state, which mean it has been clicked (not focused, not hovered, but the active state).
I tried to solve that issue, by giving the openSavedSearch
page method a retry
to wait for the actual form that lies within that open panel to be visible.
But unfortunately that did not help reliable apparently. So I am unfortunately out of ideas, because I don't know what could cause the click to happen, but the panel to actually not open. I also looked at the Discover code itself for this, if this might potentially by a real issue in the code, but giving the code for that a deep dive, I am rather sure, that this behavior could actually not happen, which makes me think, that it's some weirdness going on in our testing framework.
If anyone has any further ideas, what we could do to solve that, they are very welcome, since it will otherwise stay skipped, since I have no further ideas how we could actually debug or fix that.
/cc @elastic/kibana-discovery @LeeDr @liza-mae
Could there be a small timing window when the Discover tab opens and that button is display and enabled but not really "ready"? Just to test this we could insert a 1 second sleep right before clicking "Open".
The problem also could be caused by steps before the "Open" not waiting until their action was done. That could even be steps from the previous test not waiting until loading has finished. (I'm working on fixing a few of these now)
But if either of these is the case, I would still call it a bug in that the open menu item shouldn't be enabled if it's not actually ready.
Does it ever fail when you run the tests locally? Do you have the logging from a recent failure? I'm also wondering if this is one of the problems @liza-mae sees on Cloud testing (which is generally slower than kibana-ci or local runs).
I was unfortunately never able to recreate that failure locally. Also all the runs from before this was enabled, already seem to be cleaned up from CI, so I can't link to a recent failure. The logging was pretty much just showing the 10 retries to check for that open panel and then failed, because the referenced form was still not available.
Looking at the code, I would say it's not possible this button exist before being functional.
@Stacey-Gammon The failure in the job you linked to is related to the afterAll
hook but it sounds like this issue is actually about clicking an "Open" button in Discover or possibly about this linked issue (https://github.com/elastic/kibana/issues/17408). Either way, the output in the job you linked to is this:
16:29:37 │ proc [ftr] └- ✖ fail: "visualize app visualize lab mode "after all" hook"
16:29:37 │ proc [ftr] │ tryForTime timeout: Error: tryForTime timeout: [POST http://localhost:9515/session/6f18ea9f4de6bda603688d420b3f82bf/element / {"using":"css selector","value":"[data-test-subj~=\"advancedSetting-resetField-visualize:enableLabs\"]"}] no such element: Unable to locate element: {"method":"css selector","selector":"[data-test-subj~="advancedSetting-resetField-visualize:enableLabs"]"}
I've created https://github.com/elastic/kibana/pull/21245 to address it.
I believe this issue may have actually already been addressed by https://github.com/elastic/kibana/pull/20165 or https://github.com/elastic/kibana/pull/20782. Can @liza-mae or @timroes verify this?
Hi @cjcenizal I think that test still has issues even after those PRs you listed, at least with navigation and maybe other areas too. I have this open #20980 too, maybe your PR will help fix :)
I just had the failure 2 times out of 10 runs locally on master;
│ debg existsByDisplayedByCssSelector [data-test-subj~="loadSearchForm"]
│ debg isGlobalLoadingIndicatorVisible
│ debg TestSubjects.exists(globalLoadingIndicator)
│ debg existsByDisplayedByCssSelector [data-test-subj~="globalLoadingIndicator"]
│ debg awaitGlobalLoadingIndicatorHidden
│ debg TestSubjects.find(globalLoadingIndicator-hidden)
│ debg findByCssSelector [data-test-subj~="globalLoadingIndicator-hidden"]
│ debg find.byPartialLinkText(visualize_lab_mode_test)
│ debg --- tryForTime error: [POST http://localhost:9515/session/897d70a184daee14a3f34c47930ed810/element / {"using":"partial link text","value":"visualize_lab_mode_test"}] no such element: Unable to locate element: {"method":"partial link text","selector":"visualize_lab_mode_test"}
│ (Session info: chrome=68.0.3440.75)
│ (Driver info: chromedriver=2.36.540471 (9c759b81a907e70363c6312294d30b6ccccc2752),platform=Linux 4.15.0-20-generic x86_64)
│ debg --- tryForTime errored again with the same message ...
│ debg --- tryForTime errored again with the same message ...
│ debg --- tryForTime errored again with the same message ...
│ info Taking screenshot "/home/leedr/git/kibana/test/functional/screenshots/failure/visualize app visualize lab mode disabling does not break loading saved searches.png"
│ info Current URL is: http://localhost:5620/app/kibana#/discover/c29a94b0-9128-11e8-bff6-0985b02d2ccf?_g=()&_a=(columns:!(_source),filters:!(),index:%27logstash-*%27,interval:auto,query:(language:lucene,query:%27%27),sort:!(%27@timestamp%27,desc))
│ info Saving page source to: /home/leedr/git/kibana/test/functional/failure_debug/html/visualize app visualize lab mode disabling does not break loading saved searches.html
└- ✖ fail: "visualize app visualize lab mode disabling does not break loading saved searches"
│ tryForTime timeout: [POST http://localhost:9515/session/897d70a184daee14a3f34c47930ed810/element / {"using":"partial link text","value":"visualize_lab_mode_test"}] no such element: Unable to locate element: {"method":"partial link text","selector":"visualize_lab_mode_test"}
│ (Session info: chrome=68.0.3440.75)
│ (Driver info: chromedriver=2.36.540471 (9c759b81a907e70363c6312294d30b6ccccc2752),platform=Linux 4.15.0-20-generic x86_64)
│ at Server._post (/home/leedr/git/kibana/test/functional/services/remote/verbose_remote_logging.js:34:21)
│ at runRequest (/home/leedr/git/kibana/node_modules/leadfoot/Session.js:92:40)
│ at /home/leedr/git/kibana/node_modules/leadfoot/Session.js:113:39
│ at new Promise (/home/leedr/git/kibana/node_modules/dojo/_debug/Promise.ts:411:4)
│ at Session._post (/home/leedr/git/kibana/node_modules/leadfoot/Session.js:67:10)
│ at Session.find (/home/leedr/git/kibana/node_modules/leadfoot/Session.js:1328:15)
│ at Command.<anonymous> (/home/leedr/git/kibana/node_modules/leadfoot/Command.js:42:36)
│ at /home/leedr/git/kibana/node_modules/dojo/_debug/Promise.ts:393:16
│ at run (/home/leedr/git/kibana/node_modules/dojo/_debug/Promise.ts:237:8)
│ at /home/leedr/git/kibana/node_modules/dojo/_debug/nextTick.ts:44:4
│ at _combinedTickCallback (internal/process/next_tick.js:131:7)
│ at process._tickCallback (internal/process/next_tick.js:180:9)
│ at Command.find (/home/leedr/git/kibana/node_modules/leadfoot/Command.js:23:10)
│ at Command.prototype.(anonymous function) [as findByPartialLinkText] (/home/leedr/git/kibana/node_modules/leadfoot/lib/strategies.js:29:16)
│ at _ensureElementWithTimeout (/home/leedr/git/kibana/test/functional/services/find.js:138:29)
│ at retry.try (/home/leedr/git/kibana/test/functional/services/find.js:51:33)
│ at tryCatcher (/home/leedr/git/kibana/node_modules/bluebird/js/main/util.js:26:23)
│ at Function.Promise.attempt.Promise.try (/home/leedr/git/kibana/node_modules/bluebird/js/main/method.js:31:24)
│ at attempt (/home/leedr/git/kibana/test/common/services/retry.js:41:16)
│ at tryCatcher (/home/leedr/git/kibana/node_modules/bluebird/js/main/util.js:26:23)
│ at Promise._settlePromiseFromHandler (/home/leedr/git/kibana/node_modules/bluebird/js/main/promise.js:503:31)
│ at Promise._settlePromiseAt (/home/leedr/git/kibana/node_modules/bluebird/js/main/promise.js:577:18)
│ at Promise._settlePromises (/home/leedr/git/kibana/node_modules/bluebird/js/main/promise.js:693:14)
│ at Async._drainQueue (/home/leedr/git/kibana/node_modules/bluebird/js/main/async.js:123:16)
│ at Async._drainQueues (/home/leedr/git/kibana/node_modules/bluebird/js/main/async.js:133:10)
│ at Immediate.Async.drainQueues [as _onImmediate] (/home/leedr/git/kibana/node_modules/bluebird/js/main/async.js:15:14)
│ at runCallback (timers.js:810:20)
│ at tryOnImmediate (timers.js:768:5)
Looking in to it.
I give @Stacey-Gammon credit for tipping me off to this issue in another PR :-)
This method in discover_page;
async openSavedSearch() {
await this.clickLoadSavedSearchButton();
await PageObjects.common.sleep(2000);
await testSubjects.exists('loadSearchForm');
await PageObjects.header.waitUntilLoadingHasFinished();
}
is using await testSubjects.exists('loadSearchForm');
but that method returns this;
return await find.existsByDisplayedByCssSelector(testSubjSelector(selector), timeout);
which is returning;
return await this.exists(async remote => await remote.findDisplayedByCssSelector(selector), timeout);
and find.exists
returns a boolean true/false. So that means the
await testSubjects.exists('loadSearchForm');
may or may not be true, but we continue on to the next step in either case (only 1 second slower if it didn't exist because that's the timeout in exists
.
Oh nice! In that case we can make use of the existOrFail
method I'm adding in my other PRs to make the failure more obvious. To fix the underlying problem we could increase the sleep
I suppose, though that is an antipattern. Ideally, we should wait for the existence of an element (e.g. the "Open search" panel or the title element within it) to verify that the panel has opened.
I'm trying this as a fix;
async openSavedSearch() {
await retry.try(async () => {
await this.clickLoadSavedSearchButton();
const loadIsOpen = await testSubjects.exists('loadSearchForm');
expect(loadIsOpen).to.be(true);
});
await PageObjects.header.waitUntilLoadingHasFinished();
}
On one pass, I did this this error, but the try loop recovered it and the test passed;
│ debg TestSubjects.find(discoverOpenButton)
│ debg findByCssSelector [data-test-subj~="discoverOpenButton"]
│ debg --- tryForTime error: [POST http://localhost:9515/session/254f10af6ce75af907109fc04e550ea1/moveto / {"element":"0.8691403304735534-6"}] stale element reference: element is not attached to the page document
│ (Session info: chrome=68.0.3440.75)
│ (Driver info: chromedriver=2.36.540471 (9c759b81a907e70363c6312294d30b6ccccc2752),platform=Linux 4.15.0-20-generic x86_64)
I don't love this solution because it still leaves the mystery of why the open button doesn't just work.
Just a note that this flaky test just hit me here: https://github.com/elastic/kibana/pull/21245#issuecomment-408268531
Yep, and I've hit the stale element error one more time. Which means we're finding the Open menu item but before we click it, something about it changed. Maybe for a very short time it's not enabled, and then it is?
I don't know... we probably need to dig into the local nav code to understand it better (did a bit of that a long time ago). It's all very old Angular so maybe there's something weird/inefficient with the way it renders.
@LeeDr I just reassigned this to you since you're already on it.
Seen another test failure in #21258
This has been skipped by https://github.com/elastic/kibana/pull/19742 but it needs to actually be fixed and re-enabled