cypress-io / cypress

Fast, easy and reliable testing for anything that runs in a browser.
https://cypress.io
MIT License
46.59k stars 3.15k forks source link

Flakiness connecting to Chrome in github actions #29860

Open airhorns opened 1 month ago

airhorns commented 1 month ago

Current behavior

Most of the time, cypress is able to boot a Chrome within a GitHub Actions runner and run tests. However sometimes, the run fails to boot Chrome and errors. This can happen with the exact same code running the exact same github actions yaml -- I just have to retry a few times to create this failure.

A failure looks like this:

Opening Cypress...

DevTools listening on ws://127.0.0.1:42629/devtools/browser/c01237cc-33e3-4063-88ea-f454fc516bf7
(node:5089) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("file%3A///home/runner/.cache/Cypress/13.13.0/Cypress/resources/app/node_modules/ts-node/esm/transpile-only.mjs", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)
(node:5089) ExperimentalWarning: The Node.js specifier resolution flag is experimental. It could change or be removed at any time.
(node:5089) ExperimentalWarning: The Node.js specifier resolution flag is experimental. It could change or be removed at any time.
(Use `node --trace-warnings ...` to show where the warning was created)

tput: No value for $TERM and no -T specified
====================================================================================================

  (Run Starting)

  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Cypress:        13.13.0                                                                        │
  │ Browser:        Chrome 126 (headless)                                                          │
  │ Node Version:   v18.19.1 (/nix/store/c8phnfr1s43123qm3fmyiq5n1hs5csdv-nodejs-18.19.1/bin/      │
  │                 node)                                                                          │
  │ Specs:          10 found (AutoForm.cy.tsx, AutoFormDefaultValues.cy.tsx, AutoFormGlobalActions │
  │                 .cy.tsx, PolarisAutoBelongsToInput.cy.tsx, PolarisAutoDateTimePicker.cy.tsx, P │
  │                 olarisAutoEnumInput.cy.tsx, PolarisAutoFileInput.cy.tsx, PolarisAutoHasManyInp │
  │                 ut.cy.tsx, PolarisAutoJ...)                                                    │
  │ Searched:       **/*.cy.{js,jsx,ts,tsx}                                                        │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘

────────────────────────────────────────────────────────────────────────────────────────────────────

  Running:  AutoForm.cy.tsx                                                                (1 of 10)
Still waiting to connect to Chrome, retrying in 1 second (attempt 18/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 19/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 20/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 21/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 22/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 23/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 24/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 25/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 26/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 27/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 28/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 29/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 30/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 31/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 32/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 33/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 34/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 35/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 36/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 37/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 38/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 39/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 40/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 41/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 42/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 43/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 44/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 45/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 46/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 47/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 48/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 49/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 50/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 51/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 52/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 53/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 54/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 55/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 56/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 57/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 58/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 59/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 60/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 61/62)
Still waiting to connect to Chrome, retrying in 1 second (attempt 62/62)
Cypress failed to make a connection to the Chrome DevTools Protocol after retrying for 50 seconds.

This usually indicates there was a problem opening the Chrome browser.

The CDP port requested was 39171.

Error: connect ECONNREFUSED 127.0.0.1:39171
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)

Desired behavior

Cypress should always boot chrome, or if it can't for some reason, indicate why.

Test code to reproduce

I can't find what causes this to reliably reproduce -- it is very flakey. But I was able to capture an instance where it failed with DEBUG=cypress:*. See that log for this OSS project here: https://github.com/gadget-inc/js-clients/actions/runs/9941229702/job/27473079273

Cypress Version

13.13.0

Node version

v18.19.1

Operating System

Ubuntu 22.04 within Github Actions

Debug Logs

Debug logs from run linked above: https://gist.github.com/airhorns/70757c597cd0f7fb540e06600a8fb4bc

Other

For the exact same project on the exact same config, there was no flakiness when starting Electron, but now there is starting chrome.

airhorns commented 1 month ago

A different comment suggested that using the cypress/included docker image would improve things -- I tried switching to it here but still was able to generate the same flake: https://github.com/gadget-inc/js-clients/actions/runs/9945955229/job/27475392300?pr=510 . I also think that's a bit of a red herring, as the browsers are successfully detected and tests successfully pass on the base github actions runner for me as well, it just isn't reliable, which suggests some sort of non-deterministic issue.

axelyung commented 1 month ago

We're experiencing the same issue. At first we were running e2e against chrome, firefox and edge. The edge job failed to connect and after removing it from our matrix, firefox started to fail.

  │ Cypress:        13.13.0                                                                        
  │ Browser:        Firefox 128 (headless)                                                         
  │ Node Version:   v20.13.1 (/home/runner/runners/2.317.0/externals/node20/bin/node)   
MikeMcC399 commented 1 month ago

@airhorns

Thanks for your repo and tests! There is a bug hiding in there somewhere!

It's good that you also tried out the Cypress Docker image and were able to reproduce. That means the bug is independent of Linux variant (Ubuntu 22.04 / Debian 12.6) and independent of Node.js version (v18.x / v20.x).

axelyung commented 1 month ago

As a data point it's been relatively stable for us now that we're only using chrome. Wonder if some flakiness is introduced by running multiple GHA jobs with different browsers concurrently?

@airhorns were you running multiple browsers or just chrome?

Could it be a race condition related to the cache? 🤔

airhorns commented 1 month ago

No multi-browser stuff, just using chrome for now. It also seems to happen if I retry the exact same job a few times, which means that the same cache would be used as input each time I believe.

axelyung commented 1 month ago

Hmm, so maybe a caching issue...

airhorns commented 1 month ago

I'm not sure its a caching issue -- the browser discovery seems to have no problem at all finding the browsers in both the working and failing case. The only thing that stands out in the logs as strange to me is this:

  cypress:launcher:browsers chrome stderr: [0715/191459.100182:ERROR:file_io_posix.cc(145)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[0715/191459.100278:ERROR:file_io_posix.cc(145)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2) +36ms

which suggests chrome is struggling to check that file. Maybe the flakiness is introduced by certain underlying hardware running the GHA docker container, where some CPU platforms work and some don't? This chrome bug tracker thread has some details: https://issues.chromium.org/issues/40189632