SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.65k stars 8.18k forks source link

[🐛 Bug]: Race condition in ruby library for capybara system tests #14454

Open krschacht opened 2 months ago

krschacht commented 2 months ago

What happened?

I've been using successfully using Capybara in Rails for quite some time (many months). But one day, about a month ago, my system tests started sporadically failing in my Github CI Actions with Net::ReadTimeout with "Net::ReadTimeout with #<TCPSocket:(closed)>". If I re-run the test suite a few times I can eventually get it to successfully run through. I've tried many different workarounds but none of them work around the issue. I've tried rolling back all changes in my repo to months ago when tests were consistently passing, and that doesn't seem to fix it either.

We've spent many hours investigating the cause and we currently think there is a race condition somewhere between chromedriver and selenium. My project is an open source project so here is a direct link to one of the failed CI runs where you can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

The Net::ReadTimeout is coming from capybara (aka selenium) failing to hit chromedriver when attempting to set up the server. One of my engineers has outlined his read of that stack trace:

Also, another thing that suggests a race condition is that when we SSH into the job mid-run, it sometimes fails or hangs for a bit. But if I interrupt the process (^c) and then re-run it, it goes fine.

Capybara Version: 3.39.2 Driver Information (and browser if relevant): selenium-webdriver (4.23.0) using headless chrome

How can we reproduce the issue?

1. On github you can [fork this repo](https://github.com/AllYourBot/hostedgpt)
2. I've configured the Github CI Actions to **not** run system tests on forks, but (a) [delete this line](https://github.com/AllYourBot/hostedgpt/blob/main/.github/workflows/rubyonrails.yml#L49) to remove the short circuit, and (b) change the very next "runs-on" line back to `ubuntu-latest` which are the default Github Action servers.
3. Push a change to the repo to trigger Github CI to run

Relevant log output

You can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

Operating System

Alpine Linux

Selenium version

4.23.0 of selenium-webdriver gem

What are the browser(s) and version(s) where you see this issue?

Chrome

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver but not sure how to get version, latest, I think

Are you using Selenium Grid?

No

github-actions[bot] commented 2 months ago

@krschacht, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

AnrichVS commented 2 months ago

Hi,

I recently starting experiencing exactly what @krschacht describes.

I believe it might be related to the Chrome version. On my host OS (Arch) the issue occurs, and I'm running:

Within a Docker container, with exactly the same code base (mounted from host OS), the issue doesn't occur. It is running:

I suspect this is related to the Chrome version since it only started happening on my host OS recently after having done a full system upgrade (which also upgraded Chrome).

sickdyd commented 1 month ago

We are facing the same problem. Had specs working fine for years and since a month ago or so they started having Net::ReadTimeout: errors. I tried literally everything I could think of and searched everywhere online, nothing seems to fix the problem.

sickdyd commented 1 month ago

I can confirm that the most recent versions of Chrome seem to be the root cause.

I could solve the problem by using version 126.0.6478.61 for both Chrome and chromedriver.

Not a permanent solution, but for the time being is better than having specs constantly failing.

Note that the Chrome installer requires to add -1 to the version in the download link.

# CHROME_DRIVER_VERSION=126.0.6478.61

- name: Install Chrome
  run: |
    # Download specific Chrome version
    wget https://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-stable/google-chrome-stable_${CHROME_DRIVER_VERSION}-1_amd64.deb
    # Install Chrome
    sudo apt-get install -y --allow-downgrades ./google-chrome-stable_${CHROME_DRIVER_VERSION}-1_amd64.deb

- name: Install ChromeDriver
  run: |
    wget "https://storage.googleapis.com/chrome-for-testing-public/${CHROME_DRIVER_VERSION}/linux64/chromedriver-linux64.zip"
    unzip chromedriver-linux64.zip
    sudo mv chromedriver-linux64/chromedriver /usr/local/bin/
    rm chromedriver-linux64.zip
    rm -rf chromedriver-linux64
tvdeyen commented 1 month ago

We experience the same issues. We pinned the Chrome version to 127 with this setup

 Capybara.register_driver :selenium_chrome_headless do |app|
   options = ::Selenium::WebDriver::Chrome::Options.new.tap do |opts|
     opts.add_argument("--headless")
     opts.add_argument("--disable-gpu") if Gem.win_platform?
     # Workaround https://bugs.chromium.org/p/chromedriver/issues/detail?id=2650&q=load&sort=-id&colspec=ID%20Status%20Pri%20Owner%20Summary
     opts.add_argument("--disable-site-isolation-trials")
     opts.add_argument("--window-size=1920,1080")
     opts.add_argument("--disable-search-engine-choice-screen")
+    opts.browser_version = "127"
   end

   Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)
 end

and all tests run fine.

But it fails with Chrome 128

 Capybara.register_driver :selenium_chrome_headless do |app|
   options = ::Selenium::WebDriver::Chrome::Options.new.tap do |opts|
     opts.add_argument("--headless")
     opts.add_argument("--disable-gpu") if Gem.win_platform?
     # Workaround https://bugs.chromium.org/p/chromedriver/issues/detail?id=2650&q=load&sort=-id&colspec=ID%20Status%20Pri%20Owner%20Summary
     opts.add_argument("--disable-site-isolation-trials")
     opts.add_argument("--window-size=1920,1080")
     opts.add_argument("--disable-search-engine-choice-screen")
-    opts.browser_version = "127"
+    opts.browser_version = "128"
   end

   Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)
 end
glaszig commented 1 month ago

experiencing the same since 1 or 2 months. but i'm using firefox.

ehutzelman commented 1 month ago

Been seeing issues in system tests getting locked up since Chrome 128. Just updated to Chrome 129 and unfortunately still see the same issues. Looks like turning off headless allows the tests to run as expected, but not a great fix.