Open borisdiakur opened 9 years ago
I opened that other issue. In my case, I was running protractor inside of Vagrant and had port 4444 forwarded in my Vagrantfile. protractor silently failed/hung in this scenario
I had the same issue but it got fixed when I ran webdriver-manager start
in the foreground ie
webdriver-manager start &
protractor conf.js #hangs
but
webdriver-manager start
protractor conf.js #works
Oddly both styles seem to work now. I'm not sure but I think the nature of the tests may also determine whether protractor hangs or not. I had a test that was infinitely trying to reach a nonexistent location because I hadn't set the baseUrl
. Protractor showed the same symptoms.
I'm getting hanging behaviour, running tests with sharding on Codeship CI.
I run the same tests with Firefox and Chrome - I've split these into 2 separate config files but other than the browser setting they're the same.
The process sometimes hangs at the end of the Chrome file, and sometimes at the end of the Firefox one (and, if I'm very lucky, sometimes doesn't hang :))
Consistently seems to be "1 instances" left running (rather than 2 or more).
It's run as a foreground process so no help from @caninemwenja 's comment...
I've got a similar issue, where it hangs at
Using ChromeDirver directly...
[launcher] Running 1 instances of WebDriver
about 50% of the time. It will hang there indefinitely. However, even if it hangs you can start other instances which may or may not succeed in the same manner.
I've experienced the issue both with and without directConnect
so that's not part of the issue
Yeah, I've just confirmed that the same things happens with directConnect and standalone selenium.
Also seeing the problem, but only on our Ubuntu build servers. Haven't seen the issue yet on the Windows servers. Using Google Chrome 44.0.2403.130 and ChromeDriver 2.15.322448.
I don't know how significant it is, but its also our Ubuntu build servers which are affected. We have fedora, ubuntu, and OSX systems, but its only the Ubuntu servers which hang at the ChromeDriver start up.
Ubuntu for me too. Using Chromium 43.0.2357.130 and ChromeDriver 2.16. It works now though (with no apparent change).
We're on Ubuntu too (- using Codeship and their servers are based on Ubuntu Trusty)
We're on Ubuntu as well (updated issue description with system information "Ubuntu 12.04.5 LTS (GNU/Linux 3.13.0-37-generic x86_64)").
The ubuntu servers also use Xvfb (X Virtual Frame Buffer) to use the browser without a window manager.
Interesting; i've experienced some weird hang ups on Debian (wheezy) when using Xvfb
. They seemed to be due to the browser not being able to attach to the headless display. Is everyone here attempting to run things via Xvfb
?
If you're having problems racing with Xvfb startup try using xdpyinfo
to wait for X to be ready:
MAX=120 # About 60 seconds
CT=0
while ! xdpyinfo >/dev/null 2>&1; do
sleep 0.50s
CT=$(( CT + 1 ))
if [ "$CT" -ge "$MAX" ]; then
LOG "FATAL: $0: Gave up waiting for X server $DISPLAY"
exit 11
fi
done
LOG "X is available"
Also, you can get more logs from chromedriver itself. Those often have more details about what went wrong (but often don't have enough useful info, either). Often we have to look into the chrome debug logs, too. See http://stackoverflow.com/questions/31662828/how-to-access-chromedriver-logs-for-protractor-test for suggestions on how to get additional chromedriver logs.
Yup, using xvfb-run
on ubuntu and (obviously) not using it on the windows build servers so that sounds like a definite possibility. @tullmann do you use your script in conjunction with xvfb-run
or Xvfb
?
We have put xvfb in our init scripts, so as far as I know it is running soon after startup. Unless there is something else that needs to be done to make sure that it's "Ready" to accept connections, then I think xvfb is "ready" in my case.
@tullmann We also have xvfb running from the beginning and some of the tests run fine before the hanging. But thanks for the chromedriver logs hint. Will try that as soon as possible.
I'm using windows server 2012. If I run webdriver-manager start
in the foreground there is no problem. But if I run it in the background, similar "hanging behavior" will happen- about 1.5 minute for each browser.
I also try standalone selenium server so that I can skip the step webdriver-manager start
. The test runs without problem if I start it locally. But if I remotely use psexec on other machine to run the test, it will run in background and the hanging behavior will happen again.
I managed to enable chrome driver logs (using this method).
Here is the result:
[0,014][INFO]: COMMAND InitSession {
"desiredCapabilities": {
"browserName": "chrome",
"count": 1,
"platform": "ANY",
"version": ""
}
}
[0,014][INFO]: Populating Preferences file: {
"alternate_error_pages": {
"enabled": false
},
"autofill": {
"enabled": false
},
"browser": {
"check_default_browser": false
},
"distribution": {
"import_bookmarks": false,
"import_history": false,
"import_search_engine": false,
"make_chrome_default_for_user": false,
"show_welcome_page": false,
"skip_first_run_ui": true
},
"dns_prefetching": {
"enabled": false
},
"profile": {
"content_settings": {
"pattern_pairs": {
"https://*,*": {
"media-stream": {
"audio": "Default",
"video": "Default"
}
}
}
},
"default_content_settings": {
"geolocation": 1,
"mouselock": 1,
"notifications": 1,
"popups": 1,
"ppapi-broker": 1
},
"password_manager_enabled": false
},
"safebrowsing": {
"enabled": false
},
"search": {
"suggest_enabled": false
},
"translate": {
"enabled": false
}
}
[0,014][INFO]: Populating Local State file: {
"background_mode": {
"enabled": false
},
"ssl": {
"rev_checking": {
"enabled": false
}
}
}
[0,015][INFO]: Launching chrome: /opt/google/chrome/google-chrome --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-monitor --disable-prompt-on-repost --disable-sync --disable-web-resources --enable-logging --ignore-certificate-errors --load-extension=/tmp/.com.google.Chrome.NqAJ9w/internal --log-level=0 --metrics-recording-only --no-first-run --password-store=basic --remote-debugging-port=12199 --safebrowsing-disable-auto-update --safebrowsing-disable-download-protection --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.LqJPTu data:,
[0,016][DEBUG]: DevTools request: http://127.0.0.1:12199/json/version
[0,077][DEBUG]: DevTools request failed
[1:1:0909/154445:ERROR:image_metadata_extractor.cc(111)] Couldn't load libexif.
Xlib: extension "RANDR" missing on display ":2".
[0,127][DEBUG]: DevTools request: http://127.0.0.1:12199/json/version
[0,128][DEBUG]: DevTools request failed
[0,178][DEBUG]: DevTools request: http://127.0.0.1:12199/json/version
[0,179][DEBUG]: DevTools request failed
Xlib: extension "RANDR" missing on display ":2".
[0,229][DEBUG]: DevTools request: http://127.0.0.1:12199/json/version
[0,230][DEBUG]: DevTools request failed
[58787:58787:0909/154445:ERROR:sandbox_linux.cc(345)] InitializeSandbox() called with multiple threads in process gpu-process
[0,280][DEBUG]: DevTools request: http://127.0.0.1:12199/json/version
In my logs folder I count the occurences in all log files:
$ grep -nr "ERROR:sandbox_linux.cc(345)] InitializeSandbox() called with multiple threads in process gpu-process" . | wc -l
161
$ grep -nr "ERROR:image_metadata_extractor.cc(111)] Couldn't load libexif." . | wc -l
161
You can hide/avoid the "multiple threads in process gpu-process" by adding --disable-gpu
to the chrome command line. Generally when running under XVFB, you're not leveraging a GPU anyway, so disabling it shouldn't be bad. Anyway, I believe this (and the RANDR and the libexif messages) are harmless messages and just distracting.
Its odd that chromedriver isn't timing out (it looks like you're waiting an hour or more?). You might get more useful information from the chrome logs about what's going on. (The "DevTools request" stuff is from chromedriver trying to establish a basic connection to chrome -- it just polls repeatedly until it gets a connection). It looks like chromedriver receives no reply, but doesn't timeout either ... might be worth comparing this log to a "normal" case to see what the differences are in your setup.)
To get more chrome debug logging add the following arguments to chrome's arguments (in your protractor config): enable-logging
, v=1
and userDataDir=<somedir>
where <somedir>
is a new directory private to this run. (You can leave off userDataDir
and chrome will pick a random directory in /tmp
, but it can be annoying to figure out which one...)
Seems like enabling chrome logs fixes a race condition. The build job with chrome logging enabled refuses to hang while the build job without logging still hangs on a regular basis.
Also enabled logging and still had it hang, but the log has perhaps something useful at the end:
(google-chrome:13094): GConf-WARNING **: Client failed to connect to the D-BUS daemon:
//bin/dbus-launch terminated abnormally without any error message
[13132:13132:0929/003546:ERROR:sandbox_linux.cc(345)] InitializeSandbox() called with multiple threads in process gpu-process
The gpu-process
issue has been discussed earlier, but could the d-bus message be related?
I'm hoping that this is fixed with the new version of chromedriver in Protractor 2.3.0 and higher. Can anyone confirm?
@juliemr we've had it running for a few days now and a few hundred builds on 2.4 and haven't seen it hang, so yes, that seems to have fixed it.
Whee! Closing - please open up a new issue if this crops up again.
I'm sorry for the late reply and also sorry for letting you know that the hanging persists with protractor 2.4. Shall I really open up a new issue even if we still do not know what is really the root cause of the problem?
I've also had hanging since 2.4. Oddly, I had had an extended period without any hanging before it happened a few times last week.
This issue still occurs. on protractor 2.5.1
I can confirm that this issues still occurs on most recent versions of Protractor (used versions corresponding to updates for chromedriver & webdriver).
I've been having this problem running on an Ubuntu CI environment with parallel machines ("containers") for each build. Recently ran with --troubleshoot
and observed the following.
Output for a successful container:
DEBUG - Running with --troubleshoot
DEBUG - Protractor version: 1.8.0
DEBUG - Your base url for tests is undefined
Using ChromeDriver directly...
[launcher] Running 1 instances of WebDriver
DEBUG - WebDriver session successfully started with capabilities { caps_:
...
Output for a hanging container:
DEBUG - Running with --troubleshoot
DEBUG - Protractor version: 1.8.0
DEBUG - Your base url for tests is undefined
Using ChromeDriver directly...
[launcher] Running 1 instances of WebDriver command protractor protractor.conf.js --troubleshoot --suite=container_suite took more than 10 minutes since last output
It appears that a webdriver
session never gets started. @juliemr, can this issue be reopened?
So we were able to ssh in to both a successful container and a hanging container simultaneously and view running processes. This is the output of running ps auxwf
.
Successful container:
ubuntu 17473 2.0 0.0 85024 5988 ? S 15:51 0:26 | \_ sshd: ubuntu@pts/0
ubuntu 33618 0.0 0.0 14820 1508 pts/0 Ss+ 16:03 0:00 | \_ /bin/bash ./circle_scripts/test_override.sh
ubuntu 33934 6.7 0.0 728856 85596 pts/0 Rl+ 16:03 0:39 | \_ node /home/ubuntu/nvm/v0.10.33/bin/protractor protractor.conf.js --suite=container_suite
ubuntu 33938 1.8 0.0 378332 10212 pts/0 Sl+ 16:03 0:10 | \_ /home/ubuntu/nvm/v0.10.33/lib/node_modules/protractor/selenium/chromedriver --port=48603
ubuntu 33941 8.1 0.0 710400 90864 pts/0 Sl+ 16:03 0:47 | \_ /opt/google/chrome/chrome --disable-setuid-sandbox --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-mo
ubuntu 33949 0.0 0.0 9664 620 pts/0 S+ 16:03 0:00 | \_ cat
ubuntu 33950 0.0 0.0 9664 616 pts/0 S+ 16:03 0:00 | \_ cat
ubuntu 33953 0.0 0.0 341532 28224 pts/0 S+ 16:03 0:00 | \_ /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.1GMsD3
ubuntu 33954 0.0 0.0 28012 1972 pts/0 S+ 16:03 0:00 | | \_ /opt/google/chrome/nacl_helper
ubuntu 33958 0.0 0.0 341532 8064 pts/0 S+ 16:03 0:00 | | \_ /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.1GMsD3
ubuntu 34001 36.0 0.1 938400 204744 pts/0 Rl+ 16:03 3:29 | | \_ /opt/google/chrome/chrome --type=renderer --enable-logging --log-level=0 --test-type=webdriver --lang=en-US --user-data-dir=/tmp/.com.google.Chrome.1GMsD3 --disable-client-side-ph
ubuntu 34025 0.0 0.0 761540 35692 pts/0 Sl+ 16:03 0:00 | | \_ /opt/google/chrome/chrome --type=renderer --enable-logging --log-level=0 --test-type=webdriver --lang=en-US --user-data-dir=/tmp/.com.google.Chrome.1GMsD3 --extension-process --en
ubuntu 33994 0.0 0.0 434308 36752 pts/0 Sl+ 16:03 0:00 | \_ /opt/google/chrome/chrome --type=gpu-process --channel=33941.0.931696333 --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.1GMsD3 --supports-dual-gpus=false --gpu-dri
Hanging container:
ubuntu 4408 2.2 0.0 85472 7140 ? S 15:48 0:27 | \_ sshd: ubuntu@pts/0
ubuntu 20741 0.0 0.0 14820 1508 pts/0 Ss+ 15:59 0:00 | \_ /bin/bash ./circle_scripts/test_override.sh
ubuntu 21057 1.1 0.0 670188 36728 pts/0 Sl+ 15:59 0:06 | \_ node /home/ubuntu/nvm/v0.10.33/bin/protractor protractor.conf.js --suite=container_suite
ubuntu 21061 0.0 0.0 378208 6560 pts/0 Sl+ 15:59 0:00 | \_ /home/ubuntu/nvm/v0.10.33/lib/node_modules/protractor/selenium/chromedriver --port=56618
ubuntu 21064 0.0 0.0 556412 47968 pts/0 Sl+ 15:59 0:00 | \_ /opt/google/chrome/chrome --disable-setuid-sandbox --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-mo
ubuntu 21072 0.0 0.0 9664 616 pts/0 S+ 15:59 0:00 | \_ cat
ubuntu 21073 0.0 0.0 9664 620 pts/0 S+ 15:59 0:00 | \_ cat
ubuntu 21076 0.0 0.0 341532 28224 pts/0 S+ 15:59 0:00 | \_ /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.1bhwCh
ubuntu 21077 0.0 0.0 28012 1964 pts/0 S+ 15:59 0:00 | | \_ /opt/google/chrome/nacl_helper
ubuntu 21080 0.0 0.0 341532 7764 pts/0 S+ 15:59 0:00 | | \_ /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.1bhwCh
ubuntu 21117 0.0 0.0 434308 36912 pts/0 Sl+ 15:59 0:00 | \_ /opt/google/chrome/chrome --type=gpu-process --channel=21064.0.776113428 --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.1bhwCh --supports-dual-gpus=false --gpu-dri
ubuntu 21118 0.2 0.0 556412 13244 pts/0 S+ 15:59 0:01 | \_ /opt/google/chrome/chrome --disable-setuid-sandbox --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-han
The only discernible difference is the preference of chrome instances with type=renderer
in the successful container.
If you're running into problems with chrome occasionally hanging at startup, its probably a Chrome bug. See https://code.google.com/p/chromium/issues/detail?id=309093 where Google ran into this problem in their chrome testing setup, and worked around it in their test infrastructure.
Basically, chrome is mulit-threaded, and relies on some standard gconf libraries, one of those libraries does a fork+exec to start up "dbus" if it is not already running. Doing a fork+exec in a multi-threaded application is bad because you will occasionally fork while a different thread has the malloc lock (or other critical lock) and the child will then deadlock when it tries to acquire that lock, and everything grinds to a halt. Generally on a desktop, dbus is already running, but in many stripped-down test environments dbus does not get started.
Our work-around is to make sure dbus is running by having our scripts launch that XVFB run its child processes via dbus-launch --exit-with-session
. (We also have a script that polls for X to be ready before proceeding, that seems to have helped, but we're less confident its strictly necessary).
Here's a lightly modified version of the xvfb wrapper script we use that starts xvfb, dbus, and waits for X to be ready: https://gist.github.com/tullmann/2d8d38444c5e81a41b6d
And here's the waitForX script that depends on: https://gist.github.com/tullmann/476cc71169295d5c3fe6
@tullmann How do you use those scripts? Do you start up xvfb before every run? Or do you have those as part of startup scripts? Are they just wrappers around Xvfb?
@Callmenorm We use the bb-xfvb script to wrap each call to the protractor script in a private XVFB instance. So if you normally run protractor foo.js
, you can do bb-xvfb protractor foo.js
to run it under an XVFB instance. (The script is just a wrapper around xvfb-run
.)
If you're starting XVFB (or a real X server) in some other way (well before you get around to starting protractor), you will want to use the "dbus-launch --exit-with-session" and/or waitForX
scripts as necessary in your environment.
Thanks @tullmann! We will give it a try in the next sprint. Update: Yep, the scripts seem to do the trick. Anyway, protractor should timeout in case of a deadlock during dbus startup.
Thank you @tullmann !
I have just integrated your scripts as part of our testing on Codeship. It's now run once, and for the first time in ages all of the tests finished :)
I'll monitor it over the coming week and will shout if any issues, but in the meantime: THANK YOU :)
Likewise @tullmann's scripts have solved our CI builds timing out randomly. Thanks!
@tullmann, you're a hero.
we also have problems with hanging chrome browsers, running in docker containers, which are used as ci-agents (teamcity). When the container is started, it also starts xvfb as a service and runs for several days. Now Protractor starts Chrome himself with the "--directConnect=true" option and also starts several browsers in a single test. Some builds run smoothly, some hang indefinitly. @tullmann any idea how i can integrate your scripts?
@Sabartius make sure when you start xvfb that you also start dbus. I don't think my scripts are specifically useful in your scenario, so you'll need to figure out a different way to make sure dbus is running. Its probably as simple as just having your container run dbus-launch
in the right place (see the man page for more details).
Aww - after 2 weeks without an issue, my last 3 builds have hung.
Haven't yet had time to investigate why. Anyone else seen this @tullmann @jas13 @Callmenorm @borisdiakur ?
Seems to have fixed itself at some point this morning... very strange
@tullmann How do you get (install) dbus-launch? I have installed dbus but there isn't a dbus-launch command there. I'm using debian:jessie based docker container.
Looks like its part of the dbus-x11
package:
# apt-cache search dbus-launch
dbus-x11 - simple interprocess messaging system (X11 deps)
Not sure this is chrome specific. I have the same issue when running against firefox.
It probably isn't because sometimes it hangs even when using remote selenium grid.
I'm currently running protractor version 3.0.0, and have been seeing hanging builds specifically when running chrome (with directConnect
) on Xvfb
. I enabled chromedriver logs and chrome logs using the methods mentioned in this thread earlier.
I see this at the end of chrome's logs:
[9504:9504:0419/185420:ERROR:sandbox_linux.cc(338)] InitializeSandbox() called with multiple threads in process gpu-process
[9504:9504:0419/213227:ERROR:x11_util.cc(82)] X IO error received (X server probably went away)
[9407:9461:0419/213227:WARNING:channel.cc(358)] RawChannel write error
And this at the end of chromedriver's logs:
[9504:9504:0419/185420:ERROR:sandbox_linux.cc(338)] InitializeSandbox() called with multiple threads in process gpu-process
[9504:9504:0419/213227:ERROR:x11_util.cc(82)] X IO error received (X server probably went away)
We start Xvfb
at the beginning of our test suite before running four end-to-end test suites - two with Firefox and two with Chrome. Often a build will have already run a few of these suites (maybe even a chrome suite) before hanging. The hanging is not as often as others report though - in my last test, one out of ten repeated builds got hung.
I'll report back on this thread after trying to run the test suites with --disable-gpu
. I'll also try to locate Xvfb logs to see if something went wrong with it.
So.. reporting back:
Running with --disable-gpu
does not fix the hanging. Also, we're running Xvfb using a simple wrapper around it with pyvirtualdisplay
, and getting hold of Xvfb's output would have meant not using that wrapper, so I abandoned that approach.
I went on to run the tests inside a dbus-launch --exit-with-session
wrapper, and it worked out all great.
I also came across this bug report: SeleniumHQ/docker-selenium#87. It seems to indicate that simply setting DBUS_SESSION_BUS_ADDRESS
to /dev/null
should prevent chrome from hanging - I'll test that approach too and report back.
It looks like setting DBUS_SESSION_BUS_ADDRESS
to /dev/null
alone is sufficient to prevent chrome from deadlocking. I ran a whole bunch of iterations of my test suite, and not a single hang.
The solution of @jrharshath worked for me. Thank you very much!
At first glance this issue seems to be related with #1764 but I don't see how a network issue can cause the hanging here (
directConnect
is set totrue
). So here is the setup:protractor-config.js
Here is an extract of the essential parts of the Jenkins Job console log:
Note that it is not always the same tests scenario which leads to the hanging. Same problems with node 0.10.33 as with 0.12.7. Using Ubuntu 12.04.5 LTS (GNU/Linux 3.13.0-37-generic x86_64).