Closed brian-mann closed 1 year ago
We actually have these crashes halfway in a single spec and we have stalling too. I tried debugging this with strace and it seems to be constantly trying to acquire some locks. Our app seems to make the browser allocate 400+ MB of memory fast and the whole suite can go up to 2 GB... So resetting between specs might not be enough. Maybe between it/test is also an option?
Setting the --ipc=host
does fix this, but I wonder what happens if two instances of the test run simultaneously. Could a clash occur?
How could two instances of the test run occur simultaneously? If you wanted to parallelize you would do it over two different docker containers.
yes, two docker instances. It might be a false fear of a clash. I'm completely unaware/ignorant of what the two docker instances do share with --ipc=host
Hi I am running test cases on aws ec2 small instance and I am having this issue https://on.cypress.io/renderer-process-crashed Is there any way to avoid this
Did you try the --ipc=host
fix?
But I am not using Docker
if not sandboxed, you might have multiple chrome instances fighting over resources. What is your setup? any concurrency? are you open to a different setup?
Any update on this? As Im now getting the error with Chromium usually crashes when running amount of test suites.
This issue has been superseded by this: https://github.com/cypress-io/cypress/issues/681
That will remove the need to recover since it fixes the problem at its core
We've started hitting this fairly frequently now too
I'm having this happen randomly on travis-ci with cypress 3.0.2 (I just recently started using cypress so no clue if it happened in a previous version). It might be good to add this flag even with #681 resolved.
Edit: I was able to resolve my issue by only calling .visit() once and resetting the state of the application between tests. I know that's not ideal, but it works for now.
In hindsight my fix with --ipc=host
might be related to the shared memory issue I described in https://github.com/cypress-io/cypress/issues/350 and giving the container more shared memory might resolve crashes.
I'm also getting this issue now with cypress v3.1.0. Any updates?
Hi cypress team!
We are also getting this error when we use cypress run
as well as cypress open
We noticed that it happens more when we use cy.wait
. We can consistently reproduce it when we use cy.wait
with a value greater than 20000
. This is on our circle-ci linux containers fyi.
Hi, I'm currently trying to use cypress in Gitlab CI. I Figured out most parts, except the browser crashing.
my current gitlab CI test job is the following:
test_dev:
only:
- dev
stage: test
image: cypress/base:10
script:
- npm i --save-dev cypress
- $(npm bin)/cypress run --reporter junit --reporter-options "mochaFile=results_[hash].xml,toConsole=true"
artifacts:
paths:
- cypress/videos
reports:
junit: results_*.xml
expire_in: 1 week
This works great when the browser doesn't crash, including test reporting in gitlab's merge requests. However, it fails 50% of the times. Using the --ipc=host tag is afaik not an option in Gitlab CI.
Have you tried increasing the shared memory instead, like I discribe in https://github.com/cypress-io/cypress/issues/350 ?
I am using shared runners on gitlab ci, and shm-size doesn't seem to be an option for shared runners. Thanks anyway
I think you can configure it using this documentation https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section
Hi Please provide fix / explanation of this issue. It is always happening on one test case (and only one test case). I do not think it has to do with memory but there is no way to know. I was able to reproduce it locally without docker. I think it has to do with origin (subdomain) changes. Thanks
EDIT: Just ran in debug mode there is no way of knowing what is causing this problem unfortunately
Hi, we're also experiencing this issue in Kubernetes (using Jenkins as our CI engine). Would be happy to provide additional information if helpful.
I've recently started running into the issue, as our codebase starts to acquire more dependencies. It's intermittent and unpredictable. Sometimes I get a passing test, sometimes it fails the moment it begins.
After more experimentation, I've found that using the cypress/browsers:chrome69 image instead of the cypress/base:10 made the issue go away. This issue is likely to be tied to an older version of electron being unable to handle a larger codebase, and I think more effort should go into updating electron.
One useful thing in meantime would be if Cypress could have some way to communicate this to the caller that the browser failed. Then I could re-run the test inside CI automatically. Maybe an exit code from npm call could be different? Or some other way to determine that tests failed because of Chrome failing and not because of tests failing. Could this be added in meantime? So recovery could then be done outside of Cypress.
I think that since this issue has been made there is now a better fix for the problem by asking Chrome not to use /dev/shm
. I opened #3633 for more details about this.
I'm hitting this issue on a small digital ocean droplet (no docker / container). The test runs perfectly a dozen or so times and then starts crashing with this error. If I reboot the droplet it starts working again then eventually dies. Looks like a memory leak to me.
There appears to be plenty of memory in my docker container
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
shm 30G 8.0K 30G 1% /dev/shm
I'm also unable to figure out how to add the --ipc=host
flag for my CircleCI build... Doesn't appear to be an option.
I am using shared runners on gitlab ci, and shm-size doesn't seem to be an option for shared runners. Thanks anyway
I have this same issue.
A few days ago I started facing the same issue regardless no changes were made. It's running on Travis without docker and against a separate app that is not installed in the same code base.
What interesting, that switching to --browser chrome
seems to help with it, so looks like it is related to the electron no matter if it is headless or not - in both cases it's failing. However, with chrome, you lose the video recording.
Any progress on this topic? @brian-mann
I've become very impatient waiting for the Cypress folks to fix these crashing issues. In the meantime, I've created a very similar API using selenium and am having no memory issues. There's no recording of tests, but at least it's reliable. Here's a code snippet for you if you want to try it out.
import { Builder, ThenableWebDriver, By, WebElement, Key, Condition } from "selenium-webdriver"
/**
* Wrap any promised coming from the Selenium driver so that we can
* get stack traces that point to our code.
*/
async function wrapError<T>(p: Promise<T>) {
const e = new Error()
e["__wrapError"] = true
try {
const result = await p
// Wait just a little bit in case the browser is about to navigate
// or something.
await new Promise(resolve => setTimeout(resolve, 20))
return result
} catch (error) {
if (error["__wrapError"]) {
throw error
}
e.message = error.message
throw e
}
}
async function waitFor(
driver: ThenableWebDriver,
fn: () => Promise<boolean | object>,
timeout = 2000
) {
await driver.wait(
new Condition("wait", async () => {
try {
const result = await fn()
return Boolean(result)
} catch (error) {
return false
}
}),
timeout
)
}
class Element {
private promise: Promise<WebElement>
then: Promise<WebElement>["then"]
catch: Promise<WebElement>["catch"]
constructor(
public driver: ThenableWebDriver,
promise: Promise<WebElement> | WebElement
) {
this.promise = Promise.resolve(promise)
this.then = this.promise.then.bind(this.promise)
this.catch = this.promise.catch.bind(this.promise)
}
/** Map in the monadic sense. */
map(fn: (elm: WebElement) => Promise<WebElement | undefined | void>) {
return new Element(
this.driver,
wrapError(
this.promise.then(async elm => {
const result = await fn(elm)
if (result) {
return result
} else {
return elm
}
})
)
)
}
waitFor(fn: (elm: WebElement) => Promise<boolean | object>) {
return this.map(elm => waitFor(this.driver, () => fn(elm)))
}
mapWait(fn: (elm: WebElement) => Promise<WebElement>) {
return this.waitFor(fn).map(fn)
}
click() {
return this.map(elm => elm.click())
}
clear() {
return this.map(elm => elm.clear())
}
type(text: string) {
return this.map(elm => elm.sendKeys(text))
}
enter() {
return this.map(elm => elm.sendKeys(Key.RETURN))
}
backspace() {
return this.map(elm => elm.sendKeys(Key.BACK_SPACE))
}
find(selector: string) {
return this.mapWait(elm => {
return elm.findElement(By.css(selector))
})
}
findAll(selector: string) {
return new Elements(
this.driver,
this.promise.then(elm => {
return waitFor(this.driver, () =>
elm.findElements(By.css(selector))
).then(() => {
return elm.findElements(By.css(selector))
})
})
)
}
contains(text: string) {
return this.mapWait(elm => {
// TODO: escape text.
// https://stackoverflow.com/questions/12323403
return elm.findElement(By.xpath(`//*[contains(text(), '${text}')]`))
})
}
clickText(text: string) {
return this.contains(text).click()
}
}
class Elements {
private promise: Promise<Array<WebElement>>
then: Promise<Array<WebElement>>["then"]
catch: Promise<Array<WebElement>>["catch"]
constructor(
public driver: ThenableWebDriver,
promise: Promise<Array<WebElement>> | Array<WebElement>
) {
this.promise = Promise.resolve(promise)
this.then = this.promise.then.bind(this.promise)
this.catch = this.promise.catch.bind(this.promise)
}
/** Map in the monadic sense. */
map(
fn: (
elm: Array<WebElement>
) => Promise<Array<WebElement> | undefined | void>
) {
return new Elements(
this.driver,
wrapError(
this.promise.then(async elms => {
const result = await fn(elms)
if (Array.isArray(result)) {
return result
} else {
return elms
}
})
)
)
}
waitFor(fn: (elm: Array<WebElement>) => Promise<boolean | object>) {
return this.map(elm => waitFor(this.driver, () => fn(elm)))
}
mapWait(fn: (elm: Array<WebElement>) => Promise<Array<WebElement>>) {
return this.waitFor(fn).map(fn)
}
clickAll() {
return this.map(async elms => {
await Promise.all(elms.map(elm => elm.click()))
})
}
atIndex(index: number) {
return new Element(
this.driver,
wrapError(
this.promise.then(elms => {
const elm = elms[index]
if (!elm) {
throw new Error("Element not found!")
}
return elm
})
)
)
}
}
export class Browser {
private promise: Promise<void>
then: Promise<void>["then"]
catch: Promise<void>["catch"]
constructor(public driver: ThenableWebDriver, promise?: Promise<void>) {
this.promise = Promise.resolve(promise)
this.then = this.promise.then.bind(this.promise)
this.catch = this.promise.catch.bind(this.promise)
}
visit(route: string) {
return new Browser(
this.driver,
wrapError(
this.promise.then(async () => {
await this.driver.get(route)
})
)
)
}
rerender() {
return new Browser(this.driver, wrapError(rerender(this.driver)))
}
flushTransactions() {
return new Browser(this.driver, wrapError(flushTransactions(this.driver)))
}
find(selector: string) {
return new Element(
this.driver,
wrapError(
this.promise
.then(() => {
return waitFor(this.driver, async () =>
this.driver.findElement(By.css(selector))
)
})
.then(() => {
return this.driver.findElement(By.css(selector))
})
)
)
}
getClassName(className: string) {
return this.find("." + className)
}
getTitle() {
return this.driver.getTitle()
}
waitFor(fn: () => Promise<boolean>, timeout = 2000) {
return new Browser(this.driver, waitFor(this.driver, fn))
}
waitToLeave(url: string) {
return new Browser(
this.driver,
wrapError(
waitFor(
this.driver,
async () => {
const currentUrl = await this.driver.getCurrentUrl()
return url !== currentUrl
},
10000
)
)
)
}
waitForRoute(url: string) {
return new Browser(
this.driver,
wrapError(
waitFor(
this.driver,
async () => {
const currentUrl = await this.driver.getCurrentUrl()
return url === currentUrl
},
10000
)
)
)
}
}
We're seeing this issue crop up on Drone, which also doesn't support the --ipc=host
option. Our containers already have 16GB memory. Some notes on the behavior:
Electron logs an error message when it crashes, but actually fail the test run. Our build is green despite the fact that half the tests caused a renderer crash.
Chrome doesn't even log a message—it dies silently and the test run hangs forever.
The crash does appear to happen at the exact same time on every run, but it's not clear what we're doing to cause it. Rearranging our test code or skipping certain tests resolves the problem temporarily, but it always creeps back in.
I haven't contributed to Cypress before, but I'd be willing to take a stab at fixing the problem if someone (@brian-mann ?) can show me where to start. My team has lost a ton of time troubleshooting this and I'd love to put it to bed.
@nmuth Please see our contributing guide on how to start: https://github.com/cypress-io/cypress/blob/develop/CONTRIBUTING.md
Are you using version 3.3.1?
@jennifer-shehane Yup, we're on 3.3.1. I've read the contributing guide. I'm still coming to grips with the code. It looks like the crash handler for Electron is here. Where can I hook in to provide a crash handler for Chrome? Would that be in the launcher
package?
@RockChild Are you on 3.3.x? I commented in another thread that this seems to have popped up since 3.3.0 dropped about ~2 weeks ago.
@jbinto yeah, looks like it started crashing after upgrade to 3.3.1, so I'll try to downgrade to 3.3.0. Thanks for your insights!
I switched to cypress/browsers:chrome69
, changed the package version to 3.3.0
and, with the following build step config in drone.io
, it seems that the renderer doesn't crash anymore:
steps:
- name: dev-tests
image: cypress/browsers:chrome69
shm_size: 4096000000
mem_limit: 1000000000
commands:
- npm ci
- $(npm bin)/cypress verify
- $(npm bin)/cypress run
Later edit - it just crashed this morning, so it seems that this is not it. Isn't there any way to auto-restart the test if it crashes ?
@RockChild Downgrading to 3.3.0 (or even 3.2.0) has not resolved this issue for us.
Similar to you we just started seeing this on or around May 27. No idea what has changed, and we have tried just about everything to fix this. It is gradually getting worse, with almost 100% crash rate today (when it started a few weeks ago it was closer to 5-10%).
Only happening on CircleCI. /dev/shm
is 30GB there. No pattern to where the tests fail. Nothing interesting when using DEBUG=cypress:*
.
If you’re seeing consistent crashes and would like this implemented, please leave a note in the issue.
Yes, please.
Any update on this ? I've tried making a wrapper using Cypress in a .js file but it seems that the renderer errors aren't caught by Cypress.catch()
Please fix.
We are hitting this problem as well. Not using Docker. Unfortunately, this issue makes cypress way too unreliable for automated tests.
I've just given up and switched to testcafe.
Get Outlook for Androidhttps://aka.ms/ghei36
From: neboryte notifications@github.com Sent: Saturday, July 13, 2019 4:23:08 PM To: cypress-io/cypress Cc: Bogdan Calapod; Manual Subject: Re: [cypress-io/cypress] Recover from renderer / browser crashes (#349)
We are hitting this problem as well. Not using Docker.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcypress-io%2Fcypress%2Fissues%2F349%3Femail_source%3Dnotifications%26email_token%3DABJVVR2NXFIZX4XDY7TLEY3P7HJLZA5CNFSM4CZ6PJ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3RTVA%23issuecomment-511121876&data=02%7C01%7C%7Cca1a4a9cdfca40b14b8308d707953f8c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636986209898185694&sdata=SL3JL0DVIokmpWemSUEeWDBky6azp9PQwxar9RVq8bg%3D&reserved=0, or mute the threadhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABJVVR24JRVUTGEWYRPGWALP7HJLZANCNFSM4CZ6PJ4Q&data=02%7C01%7C%7Cca1a4a9cdfca40b14b8308d707953f8c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636986209898195699&sdata=YUj%2BJggwZ626ht5SYL8ZNqb0O6%2BZtuFV3Ch78togaHg%3D&reserved=0.
Experiencing consistent browser crashes inside of my Jenkins pipeline. Unable to get around this. I may also have to investigate alternative solutions as this is rather unstable and unpredictable (the reason I moved away from CodeCeption).
Things run great locally, but once we try running the tests on our Jenkins server the browser crashes every time and my tests never pass.
Currently on day 2 of debugging this. If I can't resolve this today I'll have to move away from Cypress.
@EvanHerman (and anyone else on this thread): FWIW, since switching to Chrome (from Electron) and setting some flags we have not seen a crash in CI for almost 2 months now.
See https://github.com/cypress-io/cypress/issues/350#issuecomment-503231128 for details.
And just to add - there is open pull request that adds video recording to Chrome https://github.com/cypress-io/cypress/pull/4791 which is THE main thing stopping people from using Chrome on CI
@jbinto Thanks for the tip - I'll switch out the image and test things out. 👍
Edit: Works perfectly, thanks again @jbinto - saved me a lot of headache!
@bahmutov That's great news! Looking forward to having the view back recorded videos in Chrome.
Please fix this, using cypress 3.4.1
Please fix this. I’m not able to set ipc=host in my ci/cd pipeline
Please fix this or provide work-around for different environments. In my case I'm running Cypress using Jenkins and pipelines where I do not have access to flags.
Please fix this issue as I am hitting 'sad face' error with docker. I am using latest cypress 3.6.1
We're seeing this quite often lately as well with 3.7.0
Happening quite a bit with electron on 3.8
Related to #348.
It is actually possible for Cypress to implement strategies when the renderer (or browser process) crashes during a test run - something like
recoverFromRendererCrashes: true
by default.There is already a mechanism for Cypress to "reload" mid-run, rebuild the state of every previous run test, skip over previously run tests, and continue with the next one in line.
In fact this is exactly what
cy.visit
already does under the hood.We can utilize this same process upon a renderer / browser process crashing to continue on with the run.
So it may look something like this:
Taking this a step further, we are starting to see several patterns emerge with how and why renderer processes crash - it is almost always related to extremely long test runs in a memory starved environment (such as Docker).
It may even be a good idea for us to always preemptively "break up" headless runs by spec file.
In other words, we could have an option like
restartBrowserBetweenSpecFiles: true
which would automatically kill the renderer / browser process before moving on to a different spec file (but still rebuild the state of the UI correctly, and still have a single contiguous video recording).To the user it would look like nothing is really different, but internally the renderer process would be killed and then restarted.
This would forcefully purge primed memory from the process, which could keep environments like
docker
from ever crashing to begin with.Depends on: #6170