cypress-io / cypress

Fast, easy and reliable testing for anything that runs in a browser.
https://cypress.io
MIT License
47.01k stars 3.18k forks source link

Docker crashes when renderer process eats up too much memory #350

Closed brian-mann closed 3 years ago

brian-mann commented 7 years ago

Related to #348 and #349.

When running headlessly on very long and memory intense applications we are seeing renderer crashes with Docker.

brian-mann commented 7 years ago

This is actually not indicative of a memory leak inside of Cypress (nor your application). This has to do with the way that browsers work under the hood.

Luckily, we have a found a simple one line fix for this problem - simply add the flag --ipc=host.

This option is documented here.

The Slack app and Atom has also been documented to crash here, here, and here for the exact same reasons.

If you are using docker run

docker run --ipc=host

If you are using docker compose

version: '2'
services:
  foobarbazservice:
    ipc: host ## this line right here

In the future we are working on a more permanent fix such as described in #349 - either to automatically recover from these crashes, or mostly prevent them up from by nuking the renderer process in between spec files.

jheijkoop commented 6 years ago

I seem to be getting this problem too, because Chrome is running out of shared memory (/dev/shm). By default docker starts images with a 64M /dev/shm (try running df -h in you instance). To change this you can supply docker with an extra argument: docker run --shm-size 512M my-image. Because we are working with mesos/marathon I had to do this via the "docker": { "parameters": [{"key": "shm-size", "value": "512M"}], ...} (https://mesosphere.github.io/marathon/docs/native-docker.html) in the json configuration when creating the app/instances.

einomi commented 6 years ago

I had the same issue with Codeship CI. Just needed to change config in my codeship-services.yml file, according to documentation https://documentation.codeship.com/pro/continuous-integration/browser-testing/#chrome-crashing

jennifer-shehane commented 6 years ago

I wonder if this is related to our other issue with a Codeship CI run failing.https://github.com/cypress-io/cypress/issues/328 Unfortunately, we don't have a pro account, and the codeship-services.yml is only available on pro.

egucciar commented 5 years ago

any advice on how to solve for this in CircleCI? I do not have a services section in my config.yml, only a jobs.

mitar commented 5 years ago

I think that since this issue has been made there is now a better fix for the problem by asking Chrome not to use /dev/shm. I opened #3633 for more details about this.

wralitera commented 5 years ago

hello @brian-mann . How do you set up the --ipc=host into the gitlab-ci.yml?

ccorcos commented 5 years ago

Anyone know how to add --ipc=host to CircleCI? It looks like they call docker run somewhere out of our control...

mitar commented 5 years ago

You cannot. And this is why this workaoround does not really work.

ccorcos commented 5 years ago

@maximilianschmitt seems to think you can: https://gitter.im/cypress-io/cypress/archives/2018/10/17

ccorcos commented 5 years ago

Actually, a build just finished and it crashed pretty quickly

ccorcos commented 5 years ago

Looks like there's plenty of shared memory.

df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
shm              30G  8.0K   30G   1% /dev/shm
adamlewkowicz commented 5 years ago

I had the same issue yet I have changed my image from cypress/base to cypress/browsers:node11.13.0-chrome73 and it now works without crashes on gitlab-ci.

image: cypress/browsers:node11.13.0-chrome73
rwralitera commented 5 years ago

I will test this image: cypress/browsers:node11.13.0-chrome73 right now. Thanks @alk831

rwralitera commented 5 years ago

I tested and it failed:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 2: 0x7fa824226887 [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 3: 0x7fa823d95a57 [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 4: 0x7fa823d959d5 [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 5: v8::internal::Factory::NewStruct(v8::internal::InstanceType) [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 6: 
<--- Last few GCs --->

[425:0x2659cbff8000]  1853805 ms: Mark-sweep 2057.7 (2176.7) -> 2057.7 (2154.2) MB, 2096.2 / 0.0 ms  (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 2096 ms) last resort 
[425:0x2659cbff8000]  1856036 ms: Mark-sweep 2057.7 (2154.2) -> 2057.7 (2154.2) MB, 2230.4 / 0.0 ms  last resort 

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x162eb08ad681 <JSObject>
    1: bindContainer(aka bindContainer) [/builds/wynd-qa/BO/node_modules/typescript/lib/typescript.js:~22229] [pc=0x2ef0ae593de3](this=0x2ae892502311 <undefined>,node=0x7235b368b9 <NodeObject map = 0x3516afdfd7d1>,containerFlags=45)
    2: bind(aka bind) [/builds/wynd-qa/BO/node_modules/typescript/lib/typescript.js:~23556] [pc=0x2ef0ae55cdac](this=0x2ae892502311 <undefined>,node=0x7235b368b9 <No...

v8::internal::Factory::NewTuple3(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 7: 0x7fa823b407a1 [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 8: 0x7fa823b4347b [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
 9: 0x7fa823b431bc [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
10: 0x7fa823b47b5f [/root/.cache/Cypress/3.2.0/Cypress/libnode.so]
11: 0x2ef0add043fd

Timed out waiting for the browser to connect. Retrying...

The following error was thrown by a plugin. We've stopped running your tests because a plugin crashed.
adamlewkowicz commented 5 years ago

@bentrole It crashed only once for me with this image and now it works properly. I would try to split large tests into smaller ones and run without recording until there is no fix for this.

rwralitera commented 5 years ago

@alk831 I already splited my tests because of that but this bug is there since 2016, If there is no fixes until now, I think that they will not correct that! ;-(

ccorcos commented 5 years ago

@brian-mann can we please reopen this issue? If we cannot scale out our testing library, I don't think we can use this service :/

adamlewkowicz commented 5 years ago

@bentrole one more thing that you can try is to run test in parallel so memory usage is smaller, however as @brian-mann mentioned this is not strictly related with cypress and maybe your CI runner is just not capable of running chromium. If you are using gitlab CI then you could try to create your own runner.

wralitera commented 5 years ago

My tests are structured like that: 1 spec file = 1 scenario. So when I run all tests using the --spec option, each spec file should run one by one and should not make an out of memory.

Running in parallele works but I have not my own runner and if I use too many runner and the test take time, I ll block other developers.

I will ask to our admin system (of gitlab runner) to try to custom the runner and activate the --ipc=host suggested by @brian-mann

Ameobea commented 5 years ago

Bumping to cypress/browsers:node11.13.0-chrome73 as my base Docker image fixed this for me as well! 🎊

r3dDoX commented 5 years ago

For people struggling to set ipc=host on their respective environment it should help to mount a volume to /dev/shm to allow shared memory for POSIX objects. In Kubernetes/OpenShift for example I mounted an "EmptyDir" volume to /dev/shm and everything is working smoothly now.

jbinto commented 5 years ago

This seems to have gotten way worse in 3.3.0. We've been running Cypress on CircleCI for almost 2 years, and these crashes have become way more frequent recently.

jbinto commented 5 years ago

We're seeing Cypress crashes multiple times daily now in CircleCI after a couple of years of solid reliability. None of the workarounds here have helped:

We've never seen a "sad face" crash since about 1 month ago. I'll look back at our CI logs and see when it started.

jbinto commented 5 years ago

It looks like crashes started 8 days ago since we upgraded to 3.3.1. I don't see any crashes associated with 3.3.0 but it was only out for 5 days in between.

idanRiski commented 5 years ago

Any news with this issue? I've already tried all your suggestions to use the flages of "ipc=host", "cpus=2" or to increase the shm/dev memory.. also to downgrade the cypress to 3.2.0 and to add the flages below to the cypress.json file: "numTestsKeptInMemory": 0, "restartBrowserBetweenSpecFiles": true, "videos: false".. but none of them has solved this issue. I got this error message (crashed becasue a sad face.." in every execution in time it using "cy.visit". this issue happens only by docker and didn't recreat by locally execution .

please your assistance, thanks.

jbinto commented 5 years ago

We have switched to --browser chrome using the cypress/browsers:node10.2.1-chrome74 Docker image, and we disable /dev/shm usage as so:

// cypress/plugins/index.js

module.exports = (on, config) => {
  // ref: https://docs.cypress.io/api/plugins/browser-launch-api.html#Usage
  on('before:browser:launch', (browser = {}, args) => {
    if (browser.name === 'chrome') {
      args.push('--disable-dev-shm-usage')
      return args
    }

    return args
  })
}

And crashes have stopped completely. I suspect something changed in between Electron 59 and Electron 61 that makes it more prone to crashes, but for anyone hitting this on CircleCI this is the only actual fix (after weeks of tinkering).

edit: The obvious drawback is we lose video recording, until #1767 is fixed. The tradeoff is unfortunately worth it here - CI being reliable is more important than being easy to debug.

Ameobea commented 5 years ago

@jbinto this looks really promising; I don't have access to the K8s cluster on which our build node runs, so I can't tweak anything on that end. I'll give this a shot and see if it solves our crashes as well!

amkoehler commented 5 years ago

@jbinto I had about perfect timing with opening this issue and seeing your solution. This is working well for us so far using the circleci/node:8-browsers image

yuanyuanlimaggie commented 5 years ago

Anyone has a good solution for Gitlab? I updated to cypress/browsers:node11.13.0-chrome73 in Gitlab. It still crashed. I am using Cypress 3.3.2.

uvesten commented 5 years ago

Please fix this. Using Cypress 3.4.1, have started seeing crashes more and more often.

jennifer-shehane commented 5 years ago

Going to reopen this since people are still having issues with crashes in Docker.

Our plan of action now is to investigate passing in the --disable-dev-shm-usage automatically. Related to https://github.com/cypress-io/cypress/issues/3633

Workaround Today

Follow these instructions for passing the --disable-dev-shm-usage flag: https://github.com/cypress-io/cypress/issues/350#issuecomment-503231128

cc @flotwig

srinu-kodi commented 5 years ago

Hi @brian-mann @jennifer-shehane Adding the --ipc=host for docker run is also not helping out and specs are failing continuously.

We have integrated the tests in CI and specs are failing. Is there any solution in near future as it is giving hard times now...

Thanks

you1anna commented 5 years ago

This is now blocking us using Cypress in CI, have tried adding --disable-dev-shm-usage as @jennifer-shehane suggested but we're seeing the 'Chromium renderer process crashed' message each time in GoCD, after Cypress has run only two tests.

srinu-kodi commented 5 years ago

This is now blocking us using Cypress in CI, have tried adding --disable-dev-shm-usage as @jennifer-shehane suggested but we're seeing the 'Chromium renderer process crashed' message each time in GoCD, after Cypress has run only two tests.

Hi @you1anna I also faced this problem, today finally I could fix this by adding shm_size: '2gb' in my docker-compose.yaml file and it is working consistently. Basically this flag will allocate the 2gb memory to the docker container and such that there will be no memory related issue...

Please add this and it should be fine. BTW it doesn't have any link with GoCD :)

Let me know if you face any problem or need help.

you1anna commented 5 years ago

Thanks @srinu-kodi. It makes sense to use more shm, but in our tests we are using a separate container with following volume share: ` volumes:

mvandebunt commented 5 years ago

have this problem since using 3.4.1 running dockers in a wercker pipeline

cmcnicholas commented 5 years ago

recently upgraded to 3.4.1 and we are also exhibiting the same issues here pretty much every test run now (because we have parallel tests) results in 2-3 machines failing whilst the other 4 pass.

fulvio-m commented 5 years ago

For Gitlab users the fix requires editing of the runner configuration by adding a new volume to the volumes configuration of your docker runner:

[[runners]]
  ...
  executor = "docker"
  ...
  [runners.docker]
   ...
    volumes = ["/dev/shm:/dev/shm", ...other volumes...]
  ...
  [runners.cache]

Basically this instructs the docker instance to mount the underlying /dev/shm volume. After editing runner needs to be restarted with gitlab-runner restart. Credits for the fix to https://lcx.wien/blog/cypress-gitlab-ci/

mohansgithub commented 5 years ago

Any idea how to use --ipc==host on the dynamic docker based jenkins agents ( which will be deployed on Kubernetes) ? in below configuration we need specify the cypress image and we need to use --ipc==host ?

image

Bsmalhi commented 4 years ago

@mohansgithub use Environment Variables add "ipc" as key and "host" as value that will do it for your docker container. Let me know if you figured it out.

ghost commented 4 years ago

Thank you @fulvio-m.

JimmyKuruvilla commented 4 years ago

--ipc=host works locally for me, but as an env var in the jenkins kubernetes plugin it did not work. However mounting it as a volume did work: image

hannahhaken commented 4 years ago

We're running Cypress 3.8.1 in Electron 78 in Docker across multiple apps and have been seeing this issue for a couple of weeks in one particular app. The rest of the apps were fine. We tried the suggested IPC host workaround but this didn't help....

What has fixed this issue is removing multiple uses of describe(). So it appears this particular test suite had multiple instances of describe() nested inside a single describe(), all in a single TypeScript file. I've now replaced these nested describe() with it(). The memory leak issue has subsided and this test suite now finishes without any problems.


Update: For the remaining apps which had memory leaks, it appears running the tests in Chrome rather than Electron does the trick.

9odzilla commented 4 years ago

@hannahhaken My tests are already written with top level describe() and it() within them. I still see this issue using electron.

bigbitbus commented 4 years ago

We have the same issue with Google Cloudbuild - its not possible to specify ipc=host within the Google Cloudbuild specification file.

zeel-swiggy commented 4 years ago

This is not happening for 3.6.0, but happened for 3.7.0 and 3.8.0

must-git-good commented 4 years ago

Joining the various voices that are pointing out that this error seems to be cropping up for us for the first time over the last few weeks. Didn't happen on all tests, but did consistently happen on some (that shouldn't have actual memory issues). Losing nested describe blocks helped, but errors remained (all this despite running ipc=host in Docker, etc.). It seems like swapping to Chrome for now may help, which means Electron and Cypress may have some interesting optimization/interactions going on under the hood.

No other input, will update if I find out more, but want to make sure this gets looked at, as it's been a significant frustration for us.

lsdkjflasjflkdasj commented 4 years ago

Same thing happening here.

tommueller commented 4 years ago

We are also constantly running into this issue and by now this is creating a lot of frustration here. We are running the Cypress tests in a Drone-CI docker setup. All the tests run perfectly fine on my local machine. It's especially confusing too me, as our application is really not that memory hungry (about 80mb max memory usage says Chrome Task Manager) and tests are very short too short (mostly between 5-15 loc).

We already tried these recommendations:

I have not yet tried these options, as they are not possible with our current setup or seem to be too unstable ...

Any updates on this would be a great help for us!