cypress-io / cypress

Fast, easy and reliable testing for anything that runs in a browser.
https://cypress.io
MIT License
47.06k stars 3.19k forks source link

When Test Replay is enabled, performance of tests drop significantly #27818

Closed RadomirNowak closed 11 months ago

RadomirNowak commented 1 year ago

Current behavior

Since the upgrade to v13, we've started experiencing significant performance drops in our e2e tests suite, which resulted in a lot of test failures. When looking at test replay of failed tests, typing in inputs would appear extremaly slow (like one letter per second), which affected the performance of application that was tested. This is not happening locally, when not recording a run to Cypress Cloud. As soon as I record a run, the performance issues reappear. Only after disabling Test Replay in Project Settings the performance is back to normal. Here's a video showing how slow typing is with Test Replay enabled (video cropped for compliace reasons): https://github.com/cypress-io/cypress/assets/11005257/a0545fff-e218-4398-9257-373bb2e93017

For comparison, this is how quick typing is performed when recording video (its a recorded video from a test run done from pre-v13 upgrade): https://github.com/cypress-io/cypress/assets/11005257/bc28684b-2358-49e1-8add-d39b93b97b30

What I also saw is for some reason in Test Replay, the network tab would show that the app is making GET requests to get font each time anything happens in the app (which may be the reason for overall slowdown): image

So to summarize: Windows (32GB ram, Intel i9-12900K):

Desired behavior

Test Replay should not affect test execution performance. We're fine with it making the overall test run slower, but the test itself should perform the same as before..

Test code to reproduce

Cypress config:

blockHosts: [
    '*.google-analytics.com',
    '*.statuspage.io',
    '*.segment.io',
    '*.intercomcdn.com',
    '*.wootric.com',
  ],
  chromeWebSecurity: false,
  defaultCommandTimeout: 30000,
  viewportHeight: 720,
  viewportWidth: 1280,
  pageLoadTimeout: 60000,

Cypress Version

13.2

Node version

20.5.0

Operating System

Ubuntu 20.04.6 LTS, Windows 11 10.0.22621

Debug Logs

No response

Other

Test Replay functionality is awesome and much better than recording videos! We are now able to inspect API calls to investigate potential problems. Unfortunately the performance drops are a show stopper for us in terms of using Test Replay.

jennifer-shehane commented 1 year ago

Hey @RadomirNowak, this is certainly not intended. We are performing a lot more things behind the scenes for Test Replay, so this has been something we’ve been keeping an eye on. There must be something unique in your test case or App that’s what worsening performance.

Would you be able to send over a Cloud url to a Test Replay that was captured when this perf behavior happened?

RadomirNowak commented 1 year ago

hi @jennifer-shehane thank you for reaching out so fast! here's the link to Test Replay when the performance issues are visible: https://cloud.cypress.io/projects/ubq3ww/runs/e3113d43-684e-4ffd-b194-4f425bbcfed3/test-results/2a45eff8-9ae6-4363-b9ab-5e67faab0f45/replay?att=1

kevingorry commented 1 year ago

We are running into similar issues. I created a branch to upgrade to Cypress 13 from Cypress 10 and we started having random failures on every run (6 or 7 tests would randomly fail and not the same ones all the time). At first I was trying to address each of them but then I turned off Test Replay and all of the failures disappeared. I tried to enable it again and same thing 8 random failures usually related to :

FYI @jennifer-shehane

nagash77 commented 1 year ago

@kevingorry can you provide a link to the test replay that shows this behavior?

jtibbit commented 1 year ago

Having a similar issue, I had to refactor some of my tests to remove a check to make sure the element is visible because of how slow the tests move with replay enabled, or when using cy.type, only the second half of the word is being typed because the page load is taking forever. This does not happen when I run locally. Example: https://cloud.cypress.io/projects/kkje96/runs/1bc9956e-8982-4141-9899-f9bdbd8c3c78/test-results/a57db8d9-ca7f-42b2-9264-24c2f9a58a7b/replay?att=2

Narretz commented 1 year ago

@RadomirNowak your screenshot with the font loading suggests that this might be fixed by #27860 (in 13.3.0). The PR also says it can improve Test Replay.

tkharuk commented 1 year ago

Similar situation, also it seems like it was not happening from the moment of the upgrade but rather started getting worse and worse overtime (or just my imagination).

Along with some other issues that we faced, I think we will revert to v12 for a bit until v13 is more mature.

In our case we have a data table with possibly bad performance, but still, v12 works ok and v13 sometimes even dies.


2 at the top are v12 2 at the bottom are v13 image

ElisonKs commented 1 year ago

We've upgraded from 12.6 to 13.2 and our tests are 2.5x slower! We tried disabling test replay in the project configs but although tests seem much faster there are a bunch of cy.screenshot() errors (and we do not call this method ever). Like @tkharuk, "it was not happening from the moment of the upgrade but rather started getting worse and worse overtime". Sadly we decided to roll back until it gets more stable.

SebasGit commented 1 year ago

We have also had to downgrade from 13 to 12. Our page load times on gitlab runners were almost 10 seconds and causing our tests to time out. I gave 13.3 a try but it didn't appear to change anything. Local testing has no issues so best guess is the added logging added too much of a burden with gitlab runners (which are pretty low power).

We'll test any new versions that come out; the replay feature is great and we'd love to make use of it.

dlively1 commented 1 year ago

Also experiencing similar poor performance when attempting to upgrade. Similar to @SebasGit we are really only seeing these performance challenges in our CI infrastructure. We are also running within Github Actions on the larger 4-cores · 16 GB RAM machines versus the default instance size.

Excited for the replay feature but can't move forward with this significant slow down.

mateustalles commented 1 year ago

Huge performance issue on CI using the latest Cypress version with Test Replay. All we did was switch and it started to get 100% slower (tests taking twice the time they would). Specially, for E2Es, this is pretty bad as it evens times out our CI threshold when the test fails & retries (30min). I'm running a very complex test suite, and honestly, keeping it under 10min is a huge effort, so I'm pretty sure this is not on our code and not worth overengineering it just to avoid the slowness issue.

gguine-sweep commented 1 year ago

Hi, same here, we have downgraded to the latest 12.

jennifer-shehane commented 1 year ago

Hi it would be helpful to get any Cloud urls of Test Replay where the performance is not great and a Cloud url of a run before the 13 upgrade and performance issues. Ideally, it would be better for someone to provide a reproducible example.

We did release a fix in 13.3.2 for a couple of performance issues, but since we're not sure what y'all are encountering, I can't say that would help any of y'all.

lgenzelis commented 1 year ago

I'm also experiencing a huge performance slowdown with cypress 13. The weird part is: I'm not using cypress cloud, I just run cypress in github actions using cypress-io/github-action@v6.5.0. With cypress 12.17.4, my whole test step would take approx. 26 minutes. With cypress 13.3.1, it went to 38 minutes :S I checked each individual test, and they all run slower with cypress 13. For some, we're talking about a 100% difference (for example, from 1766ms to 3572ms).

I checked the documentation of test replay, but apparently it's something that's just enabled/disabled using cypress cloud. It's not something I can just disable in my cypress.config.ts file, right?

dlively1 commented 1 year ago

Hi it would be helpful to get any Cloud urls of Test Replay where the performance is not great and a Cloud url of a run before the 13 upgrade and performance issues. Ideally, it would be better for someone to provide a reproducible example.

We did release a fix in 13.3.2 for a couple of performance issues, but since we're not sure what y'all are encountering, I can't say that would help any of y'all.

@jennifer-shehane

Run timed out in CI at 15 minutes: https://cloud.cypress.io/projects/m6qhfi/runs/1856/overview?roarHideRunsWithDiffGroupsAndTags=1

Typical run: https://cloud.cypress.io/projects/m6qhfi/runs/1887/specs

We originally moved forward with the upgrade to 13 and disabled test replay within the project settings. We then rolled that back and downgraded as we lost video recordings making things challenging to debug and experience some sporadic failures within the suite.

Let me know if I can provide any further information to assist troubleshooting.

jennifer-shehane commented 1 year ago

@lgenzelis That's right, if you are not recording to the Cloud then none of the Test Replay feature work is what is affecting performance. Could you try to upgrade to 13.3.2? We fixed a few performance issues in that. Could you share any information or a reproducible example? This would be a great use case to have because with cloud recording off it would narrow down a lot of factors.

jennifer-shehane commented 1 year ago

Thank you @dlively1, looking into this and will let you know if more information is needed.

jennifer-shehane commented 1 year ago

@dlively1 It looks like your issue is very similar to this issue where font requests were flooding https://github.com/cypress-io/cypress/pull/27860 Extremely weird issue, but it appears something is triggering this same situation for your tests, which is slowing them down, but the cause is outside of what we already fixed in 13.3.0. We're investigating.

lgenzelis commented 1 year ago

@jennifer-shehane I tried updating to 13.3.2, as you suggested, and I couldn't complete a single run (tried 5 times). Every run finishes at some different, random, point, with errors I've never seen before. O.o

Screenshot 2023-10-19 at 7 52 20 PM Screenshot 2023-10-19 at 7 53 17 PM
kolaente commented 1 year ago

Also noticing this. With test replay enabled, tests take 1.5h, compared to 5min. After disabling the test replay in the cloud settings times go back to 5min. In the test run you can see how each test takes a lot longer.

Test run with test replay enabled: https://drone.kolaente.de/vikunja/frontend/15023/1/8 Test run with test replay disabled: https://drone.kolaente.de/vikunja/frontend/15029/1/8 (same commit as before, after merging the PR)

Since this is a FOSS project, here's the code: https://kolaente.dev/vikunja/frontend/src/branch/main/cypress - may help with reproduction.

guy-otonomo commented 1 year ago

Can confirm issue is still present with 13.3.3. We have 2 projects in cypress cloud:

  1. one of them shows this extreme behavior when tests are false-ly failing due to timeouts because everything is really slow. Disabling Test Replay (a) tests are no longer failing, (b) instead of ~22000 ms a test now takes ~4000ms. The tests in this project have many interceptions applied (kind of a blackbox testing).
  2. the other project has Test Replay enabled, with slow tests runtimes, but not to extent they fail (and not so many interceptions like the other one).

I don't remember this happening when we first upgraded to 13.0.

Setup: Jenkins, m5.2xlarge machines, Cypress 13.3.3 running as part of a docker-compose, Electron 114.

jennifer-shehane commented 1 year ago

@kolaente we’re looking into your example. Thanks for the detailed report.

jennifer-shehane commented 1 year ago

@kolaente We've identified the issue and are tracking your specific performance issue in this issue: https://github.com/cypress-io/cypress/issues/28139 This problem is specific to testing applications that with problems with JS-created CSS, especially when it is large and frequently changing. I'd recommend turning off Test Replay for the time being until that issue is resolved.

mschile commented 1 year ago

@kolaente, the fix for your issue has been deployed if you want to try testing again.

kolaente commented 1 year ago

@mschile Do I need to wait for a cypress release?

ryanthemanuel commented 1 year ago

@kolaente this fix involved some work in the cloud. You will not need a new version of Cypress.

tkharuk commented 1 year ago

@kolaente this fix involved some work in the cloud. You will not need a new version of Cypress.

This disturbs me a bit. Even if I don't change anything, my tests might start failing suddenly. Spectator should not affect the spectacle.

jennifer-shehane commented 1 year ago

@tkharuk A portion of our Cloud service is involved when running tests in record mode, as it always has been. It’s part of orchestrating our services. We aim to create an environment that is as deterministic as possible. Fixing bugs that are largely affecting the determinism of many of our customers is one way that we do this, which Ryan and Matt are referring to fixing above. We have to balance making decisions that get us as close to deterministic as we can, which we try to do.

kolaente commented 1 year ago

@ryanthemanuel Seem fixed. Thanks!

jtibbit commented 1 year ago

@jennifer-shehane - I am seeing some performance improvement, but enabling test replay is still causing tests to run longer + increased flaky tests + failures that don't occur when test replay is disabled.

With test replay enabled: https://cloud.cypress.io/projects/kkje96/runs/589/overview?roarHideRunsWithDiffGroupsAndTags=1

With test replay disabled: https://cloud.cypress.io/projects/kkje96/runs/590/overview?roarHideRunsWithDiffGroupsAndTags=1

We also had several runs this morning with test replay disabled where the last spec file would hang and had to be cancelled manually: https://cloud.cypress.io/projects/kkje96/runs/587/overview?roarHideRunsWithDiffGroupsAndTags=1

jennifer-shehane commented 1 year ago

@jtibbit We're investigating. Thanks for providing links again.

tkharuk commented 1 year ago

In one of our repos we have 13.2.0, it had an error failing to record a replay (https://github.com/cypress-io/cypress/issues/27902), so tests were executed as usual.

Upgraded to 13.5.0 and tests that were stable before are now failing.

jennifer-shehane commented 1 year ago

@tkharuk Thanks for providing an example. We're looking into it.

mschile commented 1 year ago

@jtibbit, when looking at the test runs, I believe the increased run time is coming from the multiple attempts on the failed tests.

For instance, this test is failing since two items are indeed being returned when only one was expected in the test. Are you able to update the test or test data to only qualify on a single item?

I believe another one of the test failures may be occurring because the filtering for the Add Project dialog may be happening prior to the click completing which is causing the error. Would you be able to update the test to verify the filtering completes before clicking the item?

cameroncooks-branch commented 12 months ago

Disabling test replay reduced our regression suites runtime from 1h10m to 40m

Pod memory usage also decreased from 13gb+ to 7.5.

dlively1 commented 12 months ago

Disabling test replay reduced our regression suites runtime from 1h10m to 40m

Pod memory usage also decreased from 13gb+ to 7.5.

We originally rolled with that but decided to revert back to the older version as debugging was very challenging considering the video recording functionality didn't seem to exist in versions >= 13.

Just sharing in case others are looking for that as a temporary workaround.

SebasGit commented 12 months ago

@kolaente, the fix for your issue has been deployed if you want to try testing again.

Looks like this fix also solved our issue. Tests are passing quickly again. Thanks!!

jennifer-shehane commented 12 months ago

@dlively1 Video recording still exists in 13.x versions, it is just turned off by default. You'll need to set video: true within your config to record video in v13+.

@cameroncooks-branch Could you provide any Cloud urls of Test Replay where the performance is not great and a Cloud url of a run with Test Replay disabled? We can investigate with this information.

jennifer-shehane commented 11 months ago

Hi everyone, we're seeing a lot of improvement with performance with the issues that have been reported, aside from this issue involving font flooding (we're awaiting a Chrome update and updating our Electron to address it since it was a bug in Chromium itself). If you have a website with a lot of font requests, you may still be encountering that issue.

If you were experiencing issues before, please upgrade to the latest Cypress version and record new runs to compare the performance. If you're still encountering an issue, please provide the URL to the Cloud run so we can evaluate. We really want to fix these perf issues if they exist!

jennifer-shehane commented 11 months ago

We're going to release a couple more performance fixes in the next release that may have a great impact on some users performance. So please always make sure to update to latest. We are going to close this issue overall. Thanks everyone for sharing your issues!

If you still see a difference in performance with Test Replay enabled vs disabled for a project, please open a new issue with the link to the Test Replay in the Cloud and we'll investigate each case individually.