New headless chrome randomly hangs at the end

maciejmrozinski commented 1 year ago

Current behavior

Using new headless mode in Chrome it sometimes hangs after running test file (also mentioned in this comment https://github.com/cypress-io/cypress/issues/25972#issuecomment-1487021361)

Desired behavior

Consistent, correct behaviour just like with headless=old

Test code to reproduce

There is no simple repo to reproduce. From my observation it hangs mostly if test flow contains the page that has external (iframe) loaded content (in my case it's google recaptcha). CI runs are more vulnerable to this issue, but I was able to reproduce it few times locally inside docker container.

Simple test code that sometimes fails:

describe('test_1', () => {
    it('Test test', () => {
        cy.visit('/login');// visit page that has reCaptcha loaded
        cy.get('img.deskop__logo').click();// click on main logo, get back to homepage
    });
});

Cypress Version

12.17.0

Node version

18.16.0

Operating System

Docker image: cypress/browsers:node-18.16.0-chrome-113.0.5672.92-1-ff-113.0-edge-113.0.1774.35-1

Debug Logs

Last few lines of good run (exited correctly):
  cypress:launcher:browsers chrome stderr: [0711/114213.126568:ERROR:nacl_helper_linux.cc(355)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly +831ms
  cypress:launcher:browsers chrome exited: { code: 0, signal: null } +1ms
  cypress:server:preprocessor removeFile /tests/files/test.ts +13s
  cypress:server:preprocessor base emitter plugin close event +0ms
  cypress:server:preprocessor base emitter native close event +2ms
  cypress:server:preprocessor base emitter native close event +0ms
  cypress:server:browsers:chrome closing remote interface client +416ms
  cypress:server:cypress about to exit with code 0 +19s
  cypress:webpack close /tests/files/test.ts +8s
  cypress:server:browsers browsers.kill called with no active instance +15s
  cypress:proxy:http:util:prerequests metrics: { browserPreRequestsReceived: 108, proxyRequestsReceived: 84, immediatelyMatchedRequests: 29, unmatchedRequests: 6, unmatchedPreRequests: 0 } +0ms
  cypress:cli child event fired { event: 'exit', code: 0, signal: null } +20s
  cypress:cli Stopping Xvfb +21s
  cypress:cli child event fired { event: 'close', code: 0, signal: null } +5ms

Last few lines of bad run (hangs):
  cypress:launcher:browsers chrome exited: { code: 0, signal: null } +1s
  cypress:server:preprocessor removeFile /tests/files/test.ts +18s
  cypress:server:preprocessor base emitter plugin close event +0ms
  cypress:server:preprocessor base emitter native close event +1ms
  cypress:server:preprocessor base emitter native close event +0ms
  cypress:server:browsers:chrome closing remote interface client +10ms
  cypress:webpack close /tests/files/test.ts +11s
  cypress:launcher:browsers chrome stderr: [0711/114246.895419:ERROR:nacl_helper_linux.cc(355)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly +4ms

Other

No response

jennifer-shehane commented 1 year ago

@maciejmrozinski headless=new is what is run in Cypress by default since 12.15.0, so you don't need to pass the workaround tp get the new headless. Does turning off headless=new resolve your issue? Pass the code below, let us know.

setupNodeEvents: function setupNodeEvents(on, config) {
  on('before:browser:launch', (browser = {}, launchOptions) => {
    if (browser.name === 'chrome' && browser.isHeadless) {
        launchOptions.args.push('--headless');
    }
    return launchOptions;
  });
}

maciejmrozinski commented 1 year ago

Yes, I've already tested this before and using old headless mode is resolving the issue.

jennifer-shehane commented 1 year ago

@maciejmrozinski We'll need a test to run to reproduce this behavior fully. We haven't observed this behavior on our test suite, so there is something particular about your run that's causing the hang that we'd need to be able to run so we can track down the issue.

mikejav commented 1 year ago

It's also applicable to my project. Thx guys for mentioning --headless=old. It works like a charm.

tiehfood commented 1 year ago

We have the same issue, will there be a fix in the future?

vire commented 1 year ago

I have this issue as well on CI (CircleCI) I get yarn exited with code 1 🤷

When I try cypress run --record false --browser chrome --headless=old with config:

> yarn cypress --version
Cypress package version: 12.17.3
Cypress binary version: 12.17.3
Electron version: 21.0.0
Bundled Node version: 16.16.0

I get

cheuk0324 commented 1 year ago

Is there update for this issue?

MikeMcC399 commented 1 year ago

@vire

I have this issue as well on CI (CircleCI) I get yarn exited with code 1 🤷

When I try cypress run --record false --browser chrome --headless=old with config:

--headless=old is not to be used as a CLI parameter for Cypress. It is a command line flag for Google Chrome. The way to pass this argument is described in https://github.com/cypress-io/cypress/issues/27264#issuecomment-1632768735. The CLI argument for Cypress is --headless. (See Cypress Guides > Command Line > Options).

Be aware however that there is a new issue with Google Chrome 117 which stops this working. See https://github.com/cypress-io/cypress-documentation/issues/5483 for details. The Chrome arguments --headless and --headless=old are supposed to be equivalent, and this worked from Google Chrome 112 through 116. In Chrome 117 there is a bug which crashes Chrome when these arguments are passed. Edit: Fixed in Google Chrome 117.0.5938.132.

bericp1 commented 1 year ago

We're also encountering this issue with our cypress runs in CI.

We were using the --headless=old workaround but as mentioned that's broken in latest chrome. Removing that argument from browser launch options brought back this hanging issue for us.

We use parallelization via DeploySentinel. You can see that one of the parallel runs in this example (#5) drops to 0% CPU and hangs, producing now output, eventually being cancelled by CircleCI due to inactivity.

I'll enable verbose debug logging on the process handler and the chrome browser launch to get chrom stderr and report back if that reveals anything.

jennifer-shehane commented 1 year ago

We need a reproducible example provided so that we can narrow down the cause of hanging for some users with the new headless behavior. Please can someone provide one.

bericp1 commented 1 year ago

The problem is that it's not reproducible, it's seemingly random. We're not sure what triggers it.

tiehfood commented 1 year ago

We also still have this issue. Currently no clue on how to reproduce it, as it occurs randomly

cheuk0324 commented 1 year ago

This issue is random, it seems it is hanging with one of the longest test or with the last instance. But this seems to get resolved after upgraded to the latest

bericp1 commented 1 year ago

@jennifer-shehane we don't have a reproducible example but we did manage to capture debug logs for a run that this happened to. See attached file.

circleci.com_api_v1.1_project_github_nursefly_nursefly-web_482932_output_120_4_file=true&allocation-id=650cecab5769582ef4129e07-4-build%2FABCDEFGH.txt

This is with the following:

DEBUG=cypress:server:util:process_profiler,cypress:launcher:browsers

The logs that look promising there to me are shortly before the tests just hang we get:

  [36;1mcypress:launcher:browsers [0mchrome exited: { code: [33m0[39m, signal: [1mnull[22m } [36m+3s[0m
  [36;1mcypress:launcher:browsers [0mchrome stderr: [0922/014838.655632:ERROR:nacl_helper_linux.cc(354)] NaCl helper process running without a sandbox!

And then the chrome process is missing from the process list that immediately follows. It pops up again later but maybe cypress isn't reconnecting to the new chrome process and hence the hanging?

jsotelo commented 1 year ago

--headless=old also worked for us.

We did notice that peak memory usage was around 3.5 GB with new, whereas old uses about 2.4 GB. Our github actions runner has a 4GB resource limit. Perhaps chrome is running out of memory and is making cypress hang (pure guess).

We are using the following github actions config:

  test:
    runs-on: [self-hosted, prod]
    container:
      image: cypress/included:cypress-13.2.0-node-20.6.1-chrome-116.0.5845.187-1-ff-117.0-edge-116.0.1938.76-1
      options: --ipc=host

and the following cypress.config.ts:

  e2e: {
    setupNodeEvents(_on, _config) {
      _on('before:browser:launch', (browser, launchOptions) => {
        if (browser.name === 'chrome') {
          launchOptions.args.push('--disable-dev-shm-usage');
          launchOptions.args.push('--headless=old');
        }
        console.log(launchOptions.args);
        return launchOptions;
      });
    },

PavanGurram-DevOps commented 1 year ago

Hi there, I'm also facing same issue with cypress version 12.7.4 and chrome version 112 when try to execute the tests in parallel. I have tried using above '--headless=old' but no luck.

Please can someone help? Thanks

cheuk0324 commented 1 year ago

Our issue was resolved by itself after upgrade to v13

PavanGurram-DevOps commented 1 year ago

Unfortunately, I can't upgrade to v13 so need some workaround please

jennifer-shehane commented 11 months ago

We have observed a slowdown in performance for one project when using headless=new. We're still interested in having examples that show this behavior so that we can narrow down the issue. We suspect there's likely a bug in Chrome headless, but it's specific to some situation.

joergschiller commented 11 months ago

Not sure if it's really helpful but it could support the hypothesis that it's a bug in Chrome headless.

We're having the same issue that Chrome with new headless modes just hangs randomly after running all tests. But with a whole different stack: We're on Ruby and using RSpec/Capybara.

Javediqbal2 commented 10 months ago

@jennifer-shehane I'm facing same issue with electron browser. Cypress tests hangs up sometimes at the end and sometimes before starting. In cypress cloud it show "This spec does not have any test results because it timed out". I've faced same issue from cypress 12.13.0 to 12.17.0 and for some people it hangs in firefox too. I'm mentioning this issue for reference

https://github.com/cypress-io/github-action/issues/620

jennifer-shehane commented 8 months ago

Is this still occuring for people? We haven't had comments for a couple of months.

pirate commented 8 months ago

Yes, myself and other users of my project are still seeing headless chrome randomly hang before exit, even when it's run directly via CLI outside of cypress. I'm almost positive it's an upstream chrome bug. Rebooting often fixes it, waiting an hour and trying again sometimes fixes it, force-reinstalling chrome also often fixes it, which puts it squarely into heisenbug territory.

jennifer-shehane commented 8 months ago

@pirate What version of Chrome are you using? Have you tried updating?

pirate commented 8 months ago

This issue has been present as far back as v60 but got much worse in v112 (when we switched to the new headless=new), and has persisted all the way up to v121.0.6167.57 and beyond with some versions worse than others.

It's intermittent and hard to verify sometimes, so many issues I've found about it on related projects have gotten closed as "cannot reproduce". I've just confirmed it's happening particularly consistently with v121 though, but I still can't figure out why or when, as sometimes weird things like rebooting make it go away. I can post back here as I collect more reports on the latest versions.

There are also widespread reports of similar issues with the two telltale symptoms:

chrome headless seeems to hang indefinitely on exit sometimes
my [insert headless driver here] appears to have a memory leak (caused by chrome child processes hanging on exit and not releasing their memory)

Possibly related reports:

https://support.google.com/chrome/thread/179100154/mac-os-chrome-hangs-on-exit?hl=en
https://issues.chromium.org/issues/327458826
https://github.com/puppeteer/puppeteer/issues/7922
https://github.com/puppeteer/puppeteer/issues/1825
https://forum.uipath.com/t/google-chrome-hangs-freezes-after-few-100-records/581107
https://github.com/microsoft/playwright/issues/5327
https://github.com/microsoft/playwright/issues/4218
https://github.com/microsoft/playwright-python/issues/1074
https://github.com/microsoft/playwright/issues/6319
https://github.com/microsoft/playwright/issues/15400 (longstanding playwright issue caused by old chrome contexts not exiting properly and eating up memory, closed prematurely as you can see there are still recent comments about it happening)

It's possible some of these issues ^ are unrelated, but it's also possible they all stem from the same underlying issue of child chromium processes not exiting correctly.

The problem is widespread enough that many of the tools that use chrome headless have implemented hacky workarounds like this: https://devforth.io/blog/how-to-simply-workaround-ram-leaking-libraries-like-puppeteer-universal-way-to-fix-ram-leaks-once-and-forever/ (spawning chrome under a child process then doing killasgroup -9 after every run)

I tried again just for fun and managed to reproduce this on the first try!

I didn't even add any of the extra args we usually use (--disable-gpu, --no-sandbox, --disable-features=dbus, etc.), it hung immediately on the first try with only --headless=new and --screenshot!

This dispelled the last of my doubts, I think this is 100% an upstream Chromium bug and has nothing to do with Cypress/Playwright/Puppeteer/ArchiveBox/any driver.

I just opened an upstream bug report on the Chromium bug tracker, follow over there for progress: https://issues.chromium.org/issues/327583144 👾

profiling1 profiling2

jennifer-shehane commented 8 months ago

@pirate Thanks for the detailed writeup and opening an issue with chromium. We'll take a look. It is extremely difficult to track down with all the variables involved as you explained. Is there a way to provide the project you're running where you got it to hang immediately?

pirate commented 8 months ago

It's not in any project, it's just raw chromium headless from the command line, no extra env vars, hidden CLI flags, or profile directory provided:

# using chromium downloaded via puppeteer
# recommended by: https://www.chromium.org/getting-involved/download-chromium/
$ npx @puppeteer/browsers install chrome@121.0.6167.57
$ ~/chrome/mac_arm-121.0.6167.57/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless=new --screenshot 'https://example.com'
[63086:259:0301/184829.306692:ERROR:policy_logger.cc(156)] :components/enterprise/browser/controller/chrome_browser_cloud_management_controller.cc(161) Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build.
72602 bytes written to file screenshot.png
# ... hangs indefinitely ...
^C⏎
[138.101s]

# OR equivalent using playwright's chromium
$ pip install --upgrade playwright
$ playwright install --with-deps chromium
$ ~/Library/Caches/ms-playwright/chromium-1097/chrome-mac/Chromium.app/Contents/MacOS/Chromium --headless=new --screenshot 'https://example.com'
[63478:259:0301/185212.347544:ERROR:policy_logger.cc(156)] :components/enterprise/browser/controller/chrome_browser_cloud_management_controller.cc(161) Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build.
72602 bytes written to file screenshot.png
# ... hangs indefinitely ...
^C⏎
[241.309s]

both_puppeteer_and_playwright_chrome_hanging

Hung on the first try for both methods. Other versions besides 121.0.6167.57 do it too, but this one appears to do it particularly consistently on my machine. I can also reproduce this on freshly installed Ubuntu 22.04, and on both x86 and arm64 machines with both macOS and Linux.

cypress-app-bot commented 1 month ago

This issue has not had any activity in 180 days. Cypress evolves quickly and the reported behavior should be tested on the latest version of Cypress to verify the behavior is still occurring. It will be closed in 14 days if no updates are provided.

cypress-app-bot commented 1 month ago

This issue has been closed due to inactivity.

cypress-io / cypress