elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.82k stars 8.21k forks source link

Flaky test failures: Error: remote failed to start within 2 minutes #11499

Closed stacey-gammon closed 6 years ago

stacey-gammon commented 7 years ago

I've seen

log   [18:56:16.259] [info][status][plugin:elasticsearch@6.0.0-alpha1] Status changed from yellow to green - Kibana index ready
  log   [18:56:16.261] [info][status][ui settings] Status changed from yellow to green - Ready
Warning: Error: remote failed to start within 2 minutes
    at /var/lib/jenkins/workspace/elastic+kibana+pull-request+multijob-selenium/test/functional/services/remote/leadfoot_command.js:13:13
    at undefined.next (native)
    at step (/var/lib/jenkins/workspace/elastic+kibana+pull-request+multijob-selenium/test/functional/services/remote/leadfoot_command.js:5:1)
    at /var/lib/jenkins/workspace/elastic+kibana+pull-request+multijob-selenium/test/functional/services/remote/leadfoot_command.js:5:1� Use --force to continue.

A few times now

szydan commented 7 years ago

Ok after some digging I've discovered what was the cause of this error in my case I was running the ui test by executing

npm run test:ui:server

and in another shell

node scripts/functional_test_runner --verbose

Starting and killing the functional_test_runner over and over After some time I've hit the error The reason was a bunch of hanging selenium-server processes

ps aux | grep selenium
szydan           26700   0.0  0.4  8274924  66168 s002  S     4:56pm   0:00.90 /usr/bin/java -Dwebdriver.chrome.driver=/PATH/node_modules/digdug/selenium-standalone/chromedriver -jar /PATH/node_modules/digdug/selenium-standalone/selenium-server-standalone-3.3.1.jar -port 4444
szydan           17963   0.0  0.3  8275380  45520 s002  S     1:22pm   0:10.55 /usr/bin/java -Dwebdriver.chrome.driver=/PATH/node_modules/digdug/selenium-standalone/chromedriver -jar /PATH/node_modules/digdug/selenium-standalone/selenium-server-standalone-3.3.1.jar -port 4444
szydan            1360   0.0  0.2  8280080  35316   ??  S     5:37pm   0:27.58 /usr/bin/java -Dwebdriver.chrome.driver=/PATH/node_modules/digdug/selenium-standalone/chromedriver -jar /PATH/node_modules/digdug/selenium-standalone/selenium-server-standalone-3.3.1.jar -port 4444

They all tried to use the port 4444 so when I was trying to start the functional_test_runner the port is taken and the

node_modules/digdug/SeleniumTunnel.js _start method hangs and do not create the tunnel

Once I've killed the old hanging processes the error went away To quickly verify if you hit the same error just stick a console.log(data) inside the _start method

_start: function () {
    var self = this;
        var childHandle = this._makeChild();
        var child = childHandle.process;
        var dfd = childHandle.deferred;
        var handle = util.on(child.stderr, 'data', function (data) {
            console.log(data); // !!!! <- HERE !!!
               ..... 

and the data instead of containing the string "Selenium Server is up and running" contain the exception

java.lang.RuntimeException: java.net.BindException: Address already in use

Unfortunately, the exception is not shown to the user Hope this description will help someone to get unstuck ;-)

spalger commented 6 years ago

We seem to have overcome this