Recover from renderer / browser crashes

brian-mann commented 7 years ago

Related to #348.

It is actually possible for Cypress to implement strategies when the renderer (or browser process) crashes during a test run - something like recoverFromRendererCrashes: true by default.

There is already a mechanism for Cypress to "reload" mid-run, rebuild the state of every previous run test, skip over previously run tests, and continue with the next one in line.

In fact this is exactly what cy.visit already does under the hood.

We can utilize this same process upon a renderer / browser process crashing to continue on with the run.

So it may look something like this:

(Running Tests)

✓ test 1 - foo
✓ test 2 - bar
✓ test 3 - baz

Oh noes the renderer process crashed... we will attempt to recover

...Restarting tests at 'test 4 - quux'

✓ test 4 - quux
✓ test 5 - ipsum

Taking this a step further, we are starting to see several patterns emerge with how and why renderer processes crash - it is almost always related to extremely long test runs in a memory starved environment (such as Docker).

It may even be a good idea for us to always preemptively "break up" headless runs by spec file.

In other words, we could have an option like restartBrowserBetweenSpecFiles: true which would automatically kill the renderer / browser process before moving on to a different spec file (but still rebuild the state of the UI correctly, and still have a single contiguous video recording).

To the user it would look like nothing is really different, but internally the renderer process would be killed and then restarted.

This would forcefully purge primed memory from the process, which could keep environments like docker from ever crashing to begin with.

Depends on: #6170

jheijkoop commented 7 years ago

We actually have these crashes halfway in a single spec and we have stalling too. I tried debugging this with strace and it seems to be constantly trying to acquire some locks. Our app seems to make the browser allocate 400+ MB of memory fast and the whole suite can go up to 2 GB... So resetting between specs might not be enough. Maybe between it/test is also an option?

Setting the --ipc=host does fix this, but I wonder what happens if two instances of the test run simultaneously. Could a clash occur?

brian-mann commented 7 years ago

How could two instances of the test run occur simultaneously? If you wanted to parallelize you would do it over two different docker containers.

jheijkoop commented 7 years ago

yes, two docker instances. It might be a false fear of a clash. I'm completely unaware/ignorant of what the two docker instances do share with --ipc=host

muslim-niche commented 7 years ago

Hi I am running test cases on aws ec2 small instance and I am having this issue https://on.cypress.io/renderer-process-crashed Is there any way to avoid this

jheijkoop commented 7 years ago

Did you try the --ipc=host fix?

muslim-niche commented 7 years ago

But I am not using Docker

jheijkoop commented 7 years ago

if not sandboxed, you might have multiple chrome instances fighting over resources. What is your setup? any concurrency? are you open to a different setup?

khiettran commented 7 years ago

Any update on this? As Im now getting the error with Chromium usually crashes when running amount of test suites.

brian-mann commented 7 years ago

This issue has been superseded by this: https://github.com/cypress-io/cypress/issues/681

That will remove the need to recover since it fixes the problem at its core

tizmagik commented 6 years ago

We've started hitting this fairly frequently now too

dsherret commented 6 years ago

I'm having this happen randomly on travis-ci with cypress 3.0.2 (I just recently started using cypress so no clue if it happened in a previous version). It might be good to add this flag even with #681 resolved.

Edit: I was able to resolve my issue by only calling .visit() once and resetting the state of the application between tests. I know that's not ideal, but it works for now.

jheijkoop commented 6 years ago

In hindsight my fix with --ipc=host might be related to the shared memory issue I described in https://github.com/cypress-io/cypress/issues/350 and giving the container more shared memory might resolve crashes.

jdtzmn commented 6 years ago

I'm also getting this issue now with cypress v3.1.0. Any updates?

mechanical-turk commented 6 years ago

Hi cypress team!

We are also getting this error when we use cypress run as well as cypress open

We noticed that it happens more when we use cy.wait. We can consistently reproduce it when we use cy.wait with a value greater than 20000. This is on our circle-ci linux containers fyi.

gwaihir8 commented 6 years ago

Hi, I'm currently trying to use cypress in Gitlab CI. I Figured out most parts, except the browser crashing.

my current gitlab CI test job is the following:

test_dev:
  only:
    - dev
  stage: test
  image: cypress/base:10
  script:
    - npm i --save-dev cypress
    - $(npm bin)/cypress run --reporter junit --reporter-options "mochaFile=results_[hash].xml,toConsole=true"
  artifacts:
    paths:  
      - cypress/videos
    reports:
      junit: results_*.xml
    expire_in: 1 week

This works great when the browser doesn't crash, including test reporting in gitlab's merge requests. However, it fails 50% of the times. Using the --ipc=host tag is afaik not an option in Gitlab CI.

jheijkoop commented 6 years ago

Have you tried increasing the shared memory instead, like I discribe in https://github.com/cypress-io/cypress/issues/350 ?

gwaihir8 commented 6 years ago

I am using shared runners on gitlab ci, and shm-size doesn't seem to be an option for shared runners. Thanks anyway

jheijkoop commented 6 years ago

I think you can configure it using this documentation https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section

egucciar commented 5 years ago

Hi Please provide fix / explanation of this issue. It is always happening on one test case (and only one test case). I do not think it has to do with memory but there is no way to know. I was able to reproduce it locally without docker. I think it has to do with origin (subdomain) changes. Thanks

EDIT: Just ran in debug mode there is no way of knowing what is causing this problem unfortunately

ajcann commented 5 years ago

Hi, we're also experiencing this issue in Kubernetes (using Jenkins as our CI engine). Would be happy to provide additional information if helpful.

jpike88 commented 5 years ago

I've recently started running into the issue, as our codebase starts to acquire more dependencies. It's intermittent and unpredictable. Sometimes I get a passing test, sometimes it fails the moment it begins.

After more experimentation, I've found that using the cypress/browsers:chrome69 image instead of the cypress/base:10 made the issue go away. This issue is likely to be tied to an older version of electron being unable to handle a larger codebase, and I think more effort should go into updating electron.

mitar commented 5 years ago

One useful thing in meantime would be if Cypress could have some way to communicate this to the caller that the browser failed. Then I could re-run the test inside CI automatically. Maybe an exit code from npm call could be different? Or some other way to determine that tests failed because of Chrome failing and not because of tests failing. Could this be added in meantime? So recovery could then be done outside of Cypress.

mitar commented 5 years ago

I think that since this issue has been made there is now a better fix for the problem by asking Chrome not to use /dev/shm. I opened #3633 for more details about this.

itslenny commented 5 years ago

I'm hitting this issue on a small digital ocean droplet (no docker / container). The test runs perfectly a dozen or so times and then starts crashing with this error. If I reboot the droplet it starts working again then eventually dies. Looks like a memory leak to me.

ccorcos commented 5 years ago

There appears to be plenty of memory in my docker container

df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
shm              30G  8.0K   30G   1% /dev/shm

I'm also unable to figure out how to add the --ipc=host flag for my CircleCI build... Doesn't appear to be an option.

hhudson commented 5 years ago

I am using shared runners on gitlab ci, and shm-size doesn't seem to be an option for shared runners. Thanks anyway

I have this same issue.

RockChild commented 5 years ago

A few days ago I started facing the same issue regardless no changes were made. It's running on Travis without docker and against a separate app that is not installed in the same code base. What interesting, that switching to --browser chrome seems to help with it, so looks like it is related to the electron no matter if it is headless or not - in both cases it's failing. However, with chrome, you lose the video recording. Any progress on this topic? @brian-mann

ccorcos commented 5 years ago

I've become very impatient waiting for the Cypress folks to fix these crashing issues. In the meantime, I've created a very similar API using selenium and am having no memory issues. There's no recording of tests, but at least it's reliable. Here's a code snippet for you if you want to try it out.

import { Builder, ThenableWebDriver, By, WebElement, Key, Condition } from "selenium-webdriver"

/**
 * Wrap any promised coming from the Selenium driver so that we can
 * get stack traces that point to our code.
 */
async function wrapError<T>(p: Promise<T>) {
    const e = new Error()
    e["__wrapError"] = true
    try {
        const result = await p
        // Wait just a little bit in case the browser is about to navigate
        // or something.
        await new Promise(resolve => setTimeout(resolve, 20))
        return result
    } catch (error) {
        if (error["__wrapError"]) {
            throw error
        }
        e.message = error.message
        throw e
    }
}

async function waitFor(
    driver: ThenableWebDriver,
    fn: () => Promise<boolean | object>,
    timeout = 2000
) {
    await driver.wait(
        new Condition("wait", async () => {
            try {
                const result = await fn()
                return Boolean(result)
            } catch (error) {
                return false
            }
        }),
        timeout
    )
}

class Element {
    private promise: Promise<WebElement>
    then: Promise<WebElement>["then"]
    catch: Promise<WebElement>["catch"]

    constructor(
        public driver: ThenableWebDriver,
        promise: Promise<WebElement> | WebElement
    ) {
        this.promise = Promise.resolve(promise)
        this.then = this.promise.then.bind(this.promise)
        this.catch = this.promise.catch.bind(this.promise)
    }

    /** Map in the monadic sense. */
    map(fn: (elm: WebElement) => Promise<WebElement | undefined | void>) {
        return new Element(
            this.driver,
            wrapError(
                this.promise.then(async elm => {
                    const result = await fn(elm)
                    if (result) {
                        return result
                    } else {
                        return elm
                    }
                })
            )
        )
    }

    waitFor(fn: (elm: WebElement) => Promise<boolean | object>) {
        return this.map(elm => waitFor(this.driver, () => fn(elm)))
    }

    mapWait(fn: (elm: WebElement) => Promise<WebElement>) {
        return this.waitFor(fn).map(fn)
    }

    click() {
        return this.map(elm => elm.click())
    }

    clear() {
        return this.map(elm => elm.clear())
    }

    type(text: string) {
        return this.map(elm => elm.sendKeys(text))
    }

    enter() {
        return this.map(elm => elm.sendKeys(Key.RETURN))
    }

    backspace() {
        return this.map(elm => elm.sendKeys(Key.BACK_SPACE))
    }

    find(selector: string) {
        return this.mapWait(elm => {
            return elm.findElement(By.css(selector))
        })
    }

    findAll(selector: string) {
        return new Elements(
            this.driver,
            this.promise.then(elm => {
                return waitFor(this.driver, () =>
                    elm.findElements(By.css(selector))
                ).then(() => {
                    return elm.findElements(By.css(selector))
                })
            })
        )
    }

    contains(text: string) {
        return this.mapWait(elm => {
            // TODO: escape text.
            // https://stackoverflow.com/questions/12323403
            return elm.findElement(By.xpath(`//*[contains(text(), '${text}')]`))
        })
    }

    clickText(text: string) {
        return this.contains(text).click()
    }
}

class Elements {
    private promise: Promise<Array<WebElement>>
    then: Promise<Array<WebElement>>["then"]
    catch: Promise<Array<WebElement>>["catch"]

    constructor(
        public driver: ThenableWebDriver,
        promise: Promise<Array<WebElement>> | Array<WebElement>
    ) {
        this.promise = Promise.resolve(promise)
        this.then = this.promise.then.bind(this.promise)
        this.catch = this.promise.catch.bind(this.promise)
    }

    /** Map in the monadic sense. */
    map(
        fn: (
            elm: Array<WebElement>
        ) => Promise<Array<WebElement> | undefined | void>
    ) {
        return new Elements(
            this.driver,
            wrapError(
                this.promise.then(async elms => {
                    const result = await fn(elms)
                    if (Array.isArray(result)) {
                        return result
                    } else {
                        return elms
                    }
                })
            )
        )
    }

    waitFor(fn: (elm: Array<WebElement>) => Promise<boolean | object>) {
        return this.map(elm => waitFor(this.driver, () => fn(elm)))
    }

    mapWait(fn: (elm: Array<WebElement>) => Promise<Array<WebElement>>) {
        return this.waitFor(fn).map(fn)
    }

    clickAll() {
        return this.map(async elms => {
            await Promise.all(elms.map(elm => elm.click()))
        })
    }

    atIndex(index: number) {
        return new Element(
            this.driver,
            wrapError(
                this.promise.then(elms => {
                    const elm = elms[index]
                    if (!elm) {
                        throw new Error("Element not found!")
                    }
                    return elm
                })
            )
        )
    }
}

export class Browser {
    private promise: Promise<void>
    then: Promise<void>["then"]
    catch: Promise<void>["catch"]

    constructor(public driver: ThenableWebDriver, promise?: Promise<void>) {
        this.promise = Promise.resolve(promise)
        this.then = this.promise.then.bind(this.promise)
        this.catch = this.promise.catch.bind(this.promise)
    }

    visit(route: string) {
        return new Browser(
            this.driver,
            wrapError(
                this.promise.then(async () => {
                    await this.driver.get(route)
                })
            )
        )
    }

    rerender() {
        return new Browser(this.driver, wrapError(rerender(this.driver)))
    }

    flushTransactions() {
        return new Browser(this.driver, wrapError(flushTransactions(this.driver)))
    }

    find(selector: string) {
        return new Element(
            this.driver,
            wrapError(
                this.promise
                    .then(() => {
                        return waitFor(this.driver, async () =>
                            this.driver.findElement(By.css(selector))
                        )
                    })
                    .then(() => {
                        return this.driver.findElement(By.css(selector))
                    })
            )
        )
    }

    getClassName(className: string) {
        return this.find("." + className)
    }

    getTitle() {
        return this.driver.getTitle()
    }

    waitFor(fn: () => Promise<boolean>, timeout = 2000) {
        return new Browser(this.driver, waitFor(this.driver, fn))
    }

    waitToLeave(url: string) {
        return new Browser(
            this.driver,
            wrapError(
                waitFor(
                    this.driver,
                    async () => {
                        const currentUrl = await this.driver.getCurrentUrl()
                        return url !== currentUrl
                    },
                    10000
                )
            )
        )
    }

    waitForRoute(url: string) {
        return new Browser(
            this.driver,
            wrapError(
                waitFor(
                    this.driver,
                    async () => {
                        const currentUrl = await this.driver.getCurrentUrl()
                        return url === currentUrl
                    },
                    10000
                )
            )
        )
    }
}

beepboopitschloe commented 5 years ago

We're seeing this issue crop up on Drone, which also doesn't support the --ipc=host option. Our containers already have 16GB memory. Some notes on the behavior:

Electron logs an error message when it crashes, but actually fail the test run. Our build is green despite the fact that half the tests caused a renderer crash.
Chrome doesn't even log a message—it dies silently and the test run hangs forever.
The crash does appear to happen at the exact same time on every run, but it's not clear what we're doing to cause it. Rearranging our test code or skipping certain tests resolves the problem temporarily, but it always creeps back in.

I haven't contributed to Cypress before, but I'd be willing to take a stab at fixing the problem if someone (@brian-mann ?) can show me where to start. My team has lost a ton of time troubleshooting this and I'd love to put it to bed.

jennifer-shehane commented 5 years ago

@nmuth Please see our contributing guide on how to start: https://github.com/cypress-io/cypress/blob/develop/CONTRIBUTING.md

Are you using version 3.3.1?

beepboopitschloe commented 5 years ago

@jennifer-shehane Yup, we're on 3.3.1. I've read the contributing guide. I'm still coming to grips with the code. It looks like the crash handler for Electron is here. Where can I hook in to provide a crash handler for Chrome? Would that be in the launcher package?

jbinto commented 5 years ago

@RockChild Are you on 3.3.x? I commented in another thread that this seems to have popped up since 3.3.0 dropped about ~2 weeks ago.

RockChild commented 5 years ago

@jbinto yeah, looks like it started crashing after upgrade to 3.3.1, so I'll try to downgrade to 3.3.0. Thanks for your insights!

bogdan-calapod commented 5 years ago

I switched to cypress/browsers:chrome69, changed the package version to 3.3.0 and, with the following build step config in drone.io, it seems that the renderer doesn't crash anymore:

steps:
  - name: dev-tests
    image: cypress/browsers:chrome69
    shm_size: 4096000000
    mem_limit: 1000000000
    commands:
      - npm ci
      - $(npm bin)/cypress verify
      - $(npm bin)/cypress run

Later edit - it just crashed this morning, so it seems that this is not it. Isn't there any way to auto-restart the test if it crashes ?

jbinto commented 5 years ago

@RockChild Downgrading to 3.3.0 (or even 3.2.0) has not resolved this issue for us.

Similar to you we just started seeing this on or around May 27. No idea what has changed, and we have tried just about everything to fix this. It is gradually getting worse, with almost 100% crash rate today (when it started a few weeks ago it was closer to 5-10%).

Only happening on CircleCI. /dev/shm is 30GB there. No pattern to where the tests fail. Nothing interesting when using DEBUG=cypress:*.

nitzel commented 5 years ago

If you’re seeing consistent crashes and would like this implemented, please leave a note in the issue. Yes, please.

bogdan-calapod commented 5 years ago

Any update on this ? I've tried making a wrapper using Cypress in a .js file but it seems that the renderer errors aren't caught by Cypress.catch()

ttomaszewski commented 5 years ago

Please fix.

neboryte commented 5 years ago

We are hitting this problem as well. Not using Docker. Unfortunately, this issue makes cypress way too unreliable for automated tests.

bogdan-calapod commented 5 years ago

I've just given up and switched to testcafe.

Get Outlook for Androidhttps://aka.ms/ghei36

From: neboryte notifications@github.com Sent: Saturday, July 13, 2019 4:23:08 PM To: cypress-io/cypress Cc: Bogdan Calapod; Manual Subject: Re: [cypress-io/cypress] Recover from renderer / browser crashes (#349)

We are hitting this problem as well. Not using Docker.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcypress-io%2Fcypress%2Fissues%2F349%3Femail_source%3Dnotifications%26email_token%3DABJVVR2NXFIZX4XDY7TLEY3P7HJLZA5CNFSM4CZ6PJ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3RTVA%23issuecomment-511121876&data=02%7C01%7C%7Cca1a4a9cdfca40b14b8308d707953f8c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636986209898185694&sdata=SL3JL0DVIokmpWemSUEeWDBky6azp9PQwxar9RVq8bg%3D&reserved=0, or mute the threadhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABJVVR24JRVUTGEWYRPGWALP7HJLZANCNFSM4CZ6PJ4Q&data=02%7C01%7C%7Cca1a4a9cdfca40b14b8308d707953f8c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636986209898195699&sdata=YUj%2BJggwZ626ht5SYL8ZNqb0O6%2BZtuFV3Ch78togaHg%3D&reserved=0.

EvanHerman commented 5 years ago

Experiencing consistent browser crashes inside of my Jenkins pipeline. Unable to get around this. I may also have to investigate alternative solutions as this is rather unstable and unpredictable (the reason I moved away from CodeCeption).

Things run great locally, but once we try running the tests on our Jenkins server the browser crashes every time and my tests never pass.

Currently on day 2 of debugging this. If I can't resolve this today I'll have to move away from Cypress.

jbinto commented 5 years ago

@EvanHerman (and anyone else on this thread): FWIW, since switching to Chrome (from Electron) and setting some flags we have not seen a crash in CI for almost 2 months now.

See https://github.com/cypress-io/cypress/issues/350#issuecomment-503231128 for details.

bahmutov commented 5 years ago

And just to add - there is open pull request that adds video recording to Chrome https://github.com/cypress-io/cypress/pull/4791 which is THE main thing stopping people from using Chrome on CI

EvanHerman commented 5 years ago

@jbinto Thanks for the tip - I'll switch out the image and test things out. 👍

Edit: Works perfectly, thanks again @jbinto - saved me a lot of headache!

@bahmutov That's great news! Looking forward to having the view back recorded videos in Chrome.

uvesten commented 5 years ago

Please fix this, using cypress 3.4.1

mvandebunt commented 5 years ago

Please fix this. I’m not able to set ipc=host in my ci/cd pipeline

pauldcomanici commented 4 years ago

Please fix this or provide work-around for different environments. In my case I'm running Cypress using Jenkins and pipelines where I do not have access to flags.

calwayne commented 4 years ago

Please fix this issue as I am hitting 'sad face' error with docker. I am using latest cypress 3.6.1

davidzambrana commented 4 years ago

We're seeing this quite often lately as well with 3.7.0

9odzilla commented 4 years ago

Happening quite a bit with electron on 3.8

cypress-io / cypress

Recover from renderer / browser crashes #349