cypress-io / cypress

Fast, easy and reliable testing for anything that runs in a browser.
https://cypress.io
MIT License
46.87k stars 3.17k forks source link

Build freezes intermittently when running in parallel via cypress dashboard #7918

Closed jamime closed 3 years ago

jamime commented 4 years ago

Current behavior:

Our builds are failing intermittently, restarting the build usually results in a successful build. We're using cypress dashboard to orchestrate 20 CI machines.

Here is an example of a build that ran for almost 2 hours without any progress. https://github.com/Sage/carbon/runs/848939252?check_suite_focus=true

The build typically takes ~10 mins.

Here is the log from the step that failed. Here is the log from a different PR that also failed.

Desired behavior:

Builds should consistently pass.

Test code to reproduce

Running the cypress build will intermittently fail on this commit.

https://github.com/Sage/carbon/tree/6116e86b19145bba6d702434afb06ab4621c784c

Versions

Cypress: 4.9.0 CI: GitHub Actions GitHub Action: cypress-io/github-action@v2.2.1

jennifer-shehane commented 4 years ago

Seems to be not moving on from this beforeEach / it definition in the tests in the case you linked to (shown below).

Does it always fail on this same specfile?

If there's something intermittent, then there's likely some asynchronous that gets into this loop sometimes if the timing of the run is slightly different sometimes. It's going to be hard to track down without the full code base.

You may want to try some of the recommendations on troubleshooting why tests fail in CI but pass locally: https://on.cypress.io/using-cypress-faq#Why-do-my-Cypress-tests-pass-locally-but-not-in-CI

beforeEach(() => {
  window.testState = testState;
  const failHandler = err => {
    Cypress.off("fail", failHandler);
    testState.onFail(err);
    throw err;
  };
  Cypress.on("fail", failHandler);
})

it('I enable showPageSizeSelection', () => {
  const state = window.testState;

  return cy
  .then(() => state.onStartScenario(scenario, indexedSteps)) 
  .then(() => 
    resolveAndRunBeforeHooks.call(this, scenario.tags, state.feature.name) 
  .then(() => 
    indexedSteps.forEach(step => stepTest.call(this, state, step, rowData)))
  .then(() => 
    resolveAndRunAfterHooks.call(this, scenario.tags, state.feature.name))
  .then(() => 
    state.onFinishScenario(scenario)
  );
})
jamime commented 4 years ago

Hi Jennifer, thanks for the reply.

Does it always fail on this same specfile?

yes

If there's something intermittent, then there's likely some asynchronous that gets into this loop sometimes if the timing of the run is slightly different sometimes. It's going to be hard to track down without the full code base.

You can access the full codebase here if it helps https://github.com/Sage/carbon/tree/6116e86b19145bba6d702434afb06ab4621c784c

Seems to be not moving on from this beforeEach / it

How did you work out that it was this test that it was hanging on? I saw a mocha start event for I disable showPageSizeSelection after that but maybe I'm reading the logs wrong?

2020-07-08T08:28:03.5335163Z   cypress:server:project onMocha test end +29ms
2020-07-08T08:28:03.5341199Z   cypress:server:reporter got mocha event 'test end' with args: [ { id: 'r3', order: 1, title: 'I enable showPageSizeSelection', state: 'passed', body: 'function() {\n' + '      const state = window.testState;\n' + '      return cy\n' + '        .then(() => state.onStartScenario(scenario, indexedSteps))\n' + '        .then(() =>\n' + '          resolveAndRunBeforeHooks.call(this, scenario.tags, state.feature.name)\n' + '        )\n' + '        .then(() =>\n' + '          indexedSteps.forEach(step =>\n' + '            stepTest.call(this, state, step, rowData)\n' + '          )\n' + '        )\n' + '        .then(() =>\n' + '          resolveAndRunAfterHooks.call(this, scenario.tags, state.feature.name)\n' + '        )\n' + '        .then(() => state.onFinishScenario(scenario));\n' + '    }', type: 'test', duration: 10929, wallClockStartedAt: '2020-07-08T08:27:52.537Z', timings: { lifecycle: 73, 'before all': [Array], 'before each': [Array], test: [Object] }, file: null } ] +29ms
2020-07-08T08:28:03.5372002Z   cypress:server:project onMocha test:after:run +3ms
2020-07-08T08:28:03.5373796Z   cypress:server:reporter got mocha event 'test:after:run' with args: [ { id: 'r3', order: 1, title: 'I enable showPageSizeSelection', state: 'passed', body: 'function() {\n' + '      const state = window.testState;\n' + '      return cy\n' + '        .then(() => state.onStartScenario(scenario, indexedSteps))\n' + '        .then(() =>\n' + '          resolveAndRunBeforeHooks.call(this, scenario.tags, state.feature.name)\n' + '        )\n' + '        .then(() =>\n' + '          indexedSteps.forEach(step =>\n' + '            stepTest.call(this, state, step, rowData)\n' + '          )\n' + '        )\n' + '        .then(() =>\n' + '          resolveAndRunAfterHooks.call(this, scenario.tags, state.feature.name)\n' + '        )\n' + '        .then(() => state.onFinishScenario(scenario));\n' + '    }', type: 'test', duration: 10929, wallClockStartedAt: '2020-07-08T08:27:52.537Z', wallClockDuration: 10939, timings: { lifecycle: 73, 'before all': [Array], 'before each': [Array], test: [Object] }, file: null } ] +3ms
2020-07-08T08:28:03.5448662Z   cypress:server:project onMocha test +3ms
2020-07-08T08:28:03.5449987Z   cypress:server:reporter got mocha event 'test' with args: [ { id: 'r4', order: 2, title: 'I disable showPageSizeSelection', body: 'function() {\n' + '      const state = window.testState;\n' + '      return cy\n' + '        .then(() => state.onStartScenario(scenario, indexedSteps))\n' + '        .then(() =>\n' + '          resolveAndRunBeforeHooks.call(this, scenario.tags, state.feature.name)\n' + '        )\n' + '        .then(() =>\n' + '          indexedSteps.forEach(step =>\n' + '            stepTest.call(this, state, step, rowData)\n' + '          )\n' + '        )\n' + '        .then(() =>\n' + '          resolveAndRunAfterHooks.call(this, scenario.tags, state.feature.name)\n' + '        )\n' + '        .then(() => state.onFinishScenario(scenario));\n' + '    }', type: 'test', file: null } ] +4ms

I thought this looked suspect

2020-07-08T08:28:08.8114061Z   cypress:server:browsers:electron console message: { column: 26, level: 'debug', line: 18393, source: 'console-api', text: 'message arrived at manager storybookjs/knobs/set [object Object]', url: 'http://127.0.0.1:9001/vendors~main.acb06a633f95331e0a72.bundle.js' } +264ms
2020-07-08T08:28:09.2812071Z   cypress:server:util:socket_whitelist whitelisted socket closed, removing { localPort: 50336 } +5s

followed by

2020-07-08T08:28:25.3187687Z   cypress:https-proxy received error on client browserSocket { err: Error: read ECONNRESET     at TCP.onStreamRead (internal/stream_base_commons.js:201:27) { errno: 'ECONNRESET', code: 'ECONNRESET', syscall: 'read' }, url: '127.0.0.1:9001' } +21s

in full

2020-07-08T08:28:08.8114061Z   cypress:server:browsers:electron console message: { column: 26, level: 'debug', line: 18393, source: 'console-api', text: 'message arrived at manager storybookjs/knobs/set [object Object]', url: 'http://127.0.0.1:9001/vendors~main.acb06a633f95331e0a72.bundle.js' } +264ms
2020-07-08T08:28:09.2812071Z   cypress:server:util:socket_whitelist whitelisted socket closed, removing { localPort: 50336 } +5s
2020-07-08T08:28:14.8397120Z   cypress:server:util:process_profiler current & mean memory and CPU usage by process group:
2020-07-08T08:28:14.8398254Z   cypress:server:util:process_profiler ┌─────────┬───────────────────┬──────────────┬────────────────────┬────────────┬────────────────┬──────────┬──────────────┬─────────────┐
2020-07-08T08:28:14.8398964Z   cypress:server:util:process_profiler │ (index) │       group       │ processCount │        pids        │ cpuPercent │ meanCpuPercent │ memRssMb │ meanMemRssMb │ maxMemRssMb │
2020-07-08T08:28:14.8399674Z   cypress:server:util:process_profiler ├─────────┼───────────────────┼──────────────┼────────────────────┼────────────┼────────────────┼──────────┼──────────────┼─────────────┤
2020-07-08T08:28:14.8400292Z   cypress:server:util:process_profiler │    0    │    'Electron'     │      1       │       '4016'       │   54.19    │     28.28      │  550.83  │    406.94    │   550.83    │
2020-07-08T08:28:14.8400921Z   cypress:server:util:process_profiler │    1    │     'cypress'     │      1       │       '3601'       │   21.99    │      18.6      │  330.51  │    291.68    │   330.51    │
2020-07-08T08:28:14.8401530Z   cypress:server:util:process_profiler │    2    │     'plugin'      │      1       │       '3976'       │     0      │      9.31      │  281.42  │    255.36    │   281.42    │
2020-07-08T08:28:14.8402470Z   cypress:server:util:process_profiler │    3    │ 'electron-shared' │      3       │ '3607, 3760, 3928' │    3.34    │      1.89      │  180.24  │    149.23    │   182.36    │
2020-07-08T08:28:14.8403085Z   cypress:server:util:process_profiler │    4    │      'other'      │      2       │    '4697, 4698'    │     0      │       0        │   3.87   │     3.88     │    3.99     │
2020-07-08T08:28:14.8403698Z   cypress:server:util:process_profiler │    5    │      'TOTAL'      │      8       │        '-'         │   79.52    │     50.57      │ 1346.88  │    974.63    │   1346.88   │
2020-07-08T08:28:14.8404403Z   cypress:server:util:process_profiler └─────────┴───────────────────┴──────────────┴────────────────────┴────────────┴────────────────┴──────────┴──────────────┴─────────────┘ +10s
2020-07-08T08:28:25.3187687Z   cypress:https-proxy received error on client browserSocket { err: Error: read ECONNRESET     at TCP.onStreamRead (internal/stream_base_commons.js:201:27) { errno: 'ECONNRESET', code: 'ECONNRESET', syscall: 'read' }, url: '127.0.0.1:9001' } +21s
2020-07-08T08:28:25.3220675Z   cypress:server:util:socket_whitelist whitelisted socket closed, removing { localPort: 39586 } +16s
2020-07-08T08:28:25.4247448Z   cypress:server:util:process_profiler current & mean memory and CPU usage by process group:
2020-07-08T08:28:25.4248559Z   cypress:server:util:process_profiler ┌─────────┬───────────────────┬──────────────┬────────────────────┬────────────┬────────────────┬──────────┬──────────────┬─────────────┐
2020-07-08T08:28:25.4249193Z   cypress:server:util:process_profiler │ (index) │       group       │ processCount │        pids        │ cpuPercent │ meanCpuPercent │ memRssMb │ meanMemRssMb │ maxMemRssMb │
2020-07-08T08:28:25.4250217Z   cypress:server:util:process_profiler ├─────────┼───────────────────┼──────────────┼────────────────────┼────────────┼────────────────┼──────────┼──────────────┼─────────────┤
2020-07-08T08:28:25.4250843Z   cypress:server:util:process_profiler │    0    │    'Electron'     │      1       │       '4016'       │    43.3    │     31.29      │ 1453.55  │    616.26    │   1453.55   │
2020-07-08T08:28:25.4251464Z   cypress:server:util:process_profiler │    1    │     'cypress'     │      1       │       '3601'       │   24.77    │     19.63      │  369.84  │    304.7     │   369.84    │
2020-07-08T08:28:25.4252081Z   cypress:server:util:process_profiler │    2    │     'plugin'      │      1       │       '3976'       │    0.92    │      7.63      │  191.23  │    242.53    │   281.42    │
2020-07-08T08:28:25.4252706Z   cypress:server:util:process_profiler │    3    │ 'electron-shared' │      3       │ '3607, 3760, 3928' │    6.05    │      2.58      │  175.05  │    153.53    │   182.36    │
2020-07-08T08:28:25.4253887Z   cypress:server:util:process_profiler │    4    │      'other'      │      2       │    '4905, 4906'    │     0      │       0        │   3.79   │     3.87     │    3.99     │
2020-07-08T08:28:25.4254343Z   cypress:server:util:process_profiler │    5    │      'TOTAL'      │      8       │        '-'         │   75.04    │     54.65      │ 2193.45  │   1177.77    │   2193.45   │
2020-07-08T08:28:25.4254885Z   cypress:server:util:process_profiler └─────────┴───────────────────┴──────────────┴────────────────────┴────────────┴────────────────┴──────────┴──────────────┴─────────────┘ +11s
2020-07-08T08:28:26.7905287Z   cypress:server:server Got CONNECT request from 127.0.0.1:9001 +18s
2020-07-08T08:28:26.7922636Z   cypress:https-proxy Writing browserSocket connection headers { url: '127.0.0.1:9001', headLength: 0, headers: { host: '127.0.0.1:9001', 'proxy-connection': 'keep-alive', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Cypress/4.9.0 Chrome/80.0.3987.165 Electron/8.3.1 Safari/537.36' } } +1s
2020-07-08T08:28:26.8104154Z   cypress:https-proxy Got first head bytes { url: '127.0.0.1:9001', head: 'GET /__socket.io/?EIO=3&transport=websocket HTTP/1.1\r\nHost: 127.' } +16ms
2020-07-08T08:28:26.8105604Z   cypress:network:cors Parsed URL { port: '9001', tld: '127.0.0.1', domain: '' } +22s
2020-07-08T08:28:26.8106184Z   cypress:server:server HTTPS request does match URL: https://127.0.0.1:9001 with props: { port: '9001', tld: '127.0.0.1', domain: '' } +19ms
2020-07-08T08:28:26.8106592Z   cypress:https-proxy Not making direct connection { url: '127.0.0.1:9001' } +2ms
2020-07-08T08:28:26.8107204Z   cypress:https-proxy Making intercepted connection to 35187 +0ms
2020-07-08T08:28:26.8107784Z   cypress:https-proxy getting proxy URL { port: 35187, serverPort: 35187, sniPort: 37111, url: 'https://localhost:35187' } +1ms
2020-07-08T08:28:26.8204338Z   cypress:network:connect successfully connected { opts: { port: 35187, host: 'localhost', getDelayMsForRetry: [Function: getDelayForRetry] }, iteration: 0 } +23s
2020-07-08T08:28:26.8259715Z   cypress:https-proxy received upstreamSocket callback for request { port: 35187, hostname: 'localhost', err: undefined } +16ms
2020-07-08T08:28:26.8264929Z   cypress:server:util:socket_whitelist whitelisting socket { localPort: 39786 } +2s
2020-07-08T08:28:26.8274003Z   cypress:server:server Got UPGRADE request from /__socket.io/?EIO=3&transport=websocket +19ms
2020-07-08T08:28:26.8282029Z   cypress:server:util:socket_whitelist is incoming request whitelisted? { isWhitelisted: true, reqUrl: '/__socket.io/?EIO=3&transport=websocket', remotePort: 39786, remoteAddress: '127.0.0.1' } +1ms
2020-07-08T08:28:26.8394963Z   cypress:server:socket socket connected +23s

The same in the second log

2020-07-06T12:11:55.0223534Z   cypress:server:browsers:electron console message: { column: 26, level: 'debug', line: 18393, source: 'console-api', text: 'message arrived at manager storybookjs/knobs/set [object Object]', url: 'http://127.0.0.1:9001/vendors~main.acb06a633f95331e0a72.bundle.js' } +294ms
2020-07-06T12:11:55.0912139Z   cypress:server:util:socket_whitelist whitelisted socket closed, removing { localPort: 58092 } +5s

but this time no error.

We're using https://www.npmjs.com/package/cypress-cucumber-preprocessor

  @positive
  Scenario: I disable showPageSizeSelection
    Given I check paginate checkbox
      And I check showPageSizeSelection checkbox
    When I uncheck showPageSizeSelection checkbox
    Then pageSize is not visible

This log entry

2020-07-08T08:28:08.8114061Z   cypress:server:browsers:electron console message: { column: 26, level: 'debug', line: 18393, source: 'console-api', text: 'message arrived at manager storybookjs/knobs/set [object Object]', url: 'http://127.0.0.1:9001/vendors~main.acb06a633f95331e0a72.bundle.js' } +264ms

relates to

 Given I check paginate checkbox

then there is nothing logged after. Interestingly we use the same steps in a previous test and they succeed.

  @positive
  Scenario: I enable showPageSizeSelection
    Given I check paginate checkbox
    When I check showPageSizeSelection checkbox
    Then pageSize is visible

I'm running it locally in a loop until it fails, but so far haven't been able to reproduce it.

jennifer-shehane commented 4 years ago

You might want to turn off the ASCII characters to clean up the noise of the logs as explained here.

I just looked through the logs and saw that was the last hook/test definition logged, but yah I may have missed a later log.

I've seen some people mention setting --shm-size for GitHub actions maybe resolving some hanging issues. https://github.com/cypress-io/cypress/issues/6695#issuecomment-627376828

I would say this is a duplciate of https://github.com/cypress-io/cypress/issues/6695 overall, which is a bit of a black hole on causes also.

jamime commented 4 years ago

Thanks for the tip for ASCII.

Unfortunately --shm-size is a container option, we don't run our tests in a container, we run them on the host directly.

If you do not set a container, all steps will run directly on the host specified by runs-on unless a step refers to an action configured to run in a container.

Any other ideas we could investigate?

jennifer-shehane commented 3 years ago

Duplicate of https://github.com/cypress-io/cypress/issues/6695