getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.99k stars 1.57k forks source link

Replay is making my Angular app super slow #6946

Closed mebibou closed 10 months ago

mebibou commented 1 year ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which package are you using?

@sentry/angular

SDK Version

7.31.1

Framework Version

Angular 14.2.8

Link to Sentry event

No response

SDK Setup

Sentry.init({
  ...environment.sentryApi,
  replaysSessionSampleRate: 0.5,
  replaysOnErrorSampleRate: 1.0,
  integrations: [
    new Sentry.Replay(),
    new Sentry.Integrations.TryCatch({
      XMLHttpRequest: false
    }),
    new BrowserTracing({
      tracingOrigins: ['apiUrl']
    })
  ],
  tracesSampleRate: 0.5
});

Steps to Reproduce

Add the Replay integration to the Angular app. Without it the website is smooth, as soon as I add it it gets super slow continuously

Expected Result

Add Replay shouldn't impact performance so much, or it's basically becomes unusable

Actual Result

Everything is slow

Lms24 commented 1 year ago

Hi @mebibou, thanks for writing in! I'm afraid, Replay is always going to add overhead to your app, as rrweb (the underlying library we use to record everything that's happening) just needs to listen and react to so many events, DOM mutations, style changes, etc.

We're right now working on performance overhead measurements, to better understand the overall impact. For instance, https://github.com/getsentry/sentry-javascript/pull/6611 introduced the first version of them. We're still trying to improve here, as we've noticed that results seem to fluctuate and depend a lot on the individual apps.

It's interesting that the impact is so noticeable in your app. Would you mind sharing a little information about it? For instance:

Lastly, I guess the thin you can try at the moment to mitigate the overhead would be to lower the sample rates of errors and sessions. If a session is not sampled (both entire sessions and errors), our integration doesn't do anything and there should be no noticable performance impact.

Hope this helps a little. LMK if you have more questions :)

mebibou commented 1 year ago

@Lms24 thanks for the reply

Our app does have a lot of interaction, and one part of it where the lag is very noticeable is a place where we have a dynamic form, i.e. the user can add more components, add some text, sometimes clicking on a button will open a dialogue, etc. so based on ReactiveForms.

But even on simple pages where we just display a list of search results, the results took about 2-5 seconds to display compared to without the Replay you would barely see the loader.

I didn't really perform any measurements as users started to complain the website was too slow to use so I simply disabled it. I don't think sampling would help, as it would still fall randomly on some users to have a horrible experience?

An idea that I had though: is there any way to enable the integration, but have some way to turn it on/off at runtime? I am thinking in cases where the user has a bug, and it could "turn on recording" (with a disclaimer that everything will be slower when recording is on)?

Lms24 commented 1 year ago

Our app does have a lot of interaction, and one part of it where the lag is very noticeable is a place where we have a dynamic form, i.e. the user can add more components, add some text, sometimes clicking on a button will open a dialogue, etc. so based on ReactiveForms.

A wild guess here would be that maybe some validation logic is constantly updating the UI after an input? It's been a while since I played with Angular Reactive Forms but maybe the debounceTime operator for observing user input helps?

An idea that I had though: is there any way to enable the integration, but have some way to turn it on/off at runtime? I am thinking in cases where the user has a bug, and it could "turn on recording" (with a disclaimer that everything will be slower when recording is on)?

Yes, you can start and stop recording manually. For instance, it might be possible to stop recording on these more complex pages.

Furthermore, only recording when errors happen is supported ootb by the Replay integration:

  replaysSessionSampleRate: 0,
  replaysOnErrorSampleRate: 1.0, // or a smaller value, depending on what you perefer:

Note though, at the beginning of a session, we make a sampling decision based on the sample rates. If the session is sampled for errors, we'll still record all the time but only keep the last 60s of recording in the buffer. Hence, performance degredation is probably still noticeable.

mebibou commented 1 year ago

sorry missed that section in the readme, I will have a play with it when I can spend more time on this.

For instance, it might be possible to stop recording on these more complex pages.

Yes but unfortunately, the more complex the page, the more the need to record the sessions!

We do use debounceTime and other performance related tricks, but when you've got hundreds of inputs and buttons etc on a page, eventually the memory usage increases. I imagined this would impact performances, but not by that much (most of the time the page was unresponsive after a few minutes)

billyvg commented 1 year ago

@mebibou do you have a publicly accessible URL where this happens that we can use to test ourselves? I'm curious if it's form input related and if ignoring events on the inputs would help perf.

Lms24 commented 1 year ago

One more question: What kind of change detection strategy are you using?

woodchuck commented 1 year ago

We use Angular Material. In a tab group you can wrap a tab with matTabContent to prevent it from initializing until you actually click on the tab. For tabs where we've done that, it appears Sentry replay causes it to hang badly. For example, the initial Audits tab here loads fine, clicking to Pending MR Review works fine, but clicking Incomplete/Approval needed hangs to the point the app is unusable. This worked fine before enabling replay, and adding/removing matTabContent toggles the problem on and off.

<mat-tab-group (selectedTabChange)="onTabChanged($event);">
  <mat-tab label="Audits">
    <app-audit-list [dateRange]=dateRange></app-audit-list>
  </mat-tab>
  <mat-tab label="Pending MR Review">
      <app-result-list [data]="mrPending"></app-result-list>
  </mat-tab>
  <mat-tab label="Incomplete/Approval Needed">
    <ng-template matTabContent>
      <app-result-list [data]="mrIncomplete"></app-result-list>
    </ng-template>
  </mat-tab>
</mat-tab-group>

We are using the default change detection strategy.

I'd also like to say replay is amazing. It's been a long time since software has delighted me this much. Kudos!

mebibou commented 1 year ago

One more question: What kind of change detection strategy are you using?

We use the default in most places. We tried to use "OnPush" but we had many problems with components no re-rendering anything and as the application is quite complex, it would take too long to debug/rewrite everything so OnPush is applicable. Is that something that doesn't bode well with Replay?

Lms24 commented 1 year ago

Is that something that doesn't bode well with Replay?

Potentially, yes. Probably not the default CD strategy per sΓ© but the combination of it with complex apps, is my best guess. Generally, because rrweb records every single DOM mutation, style change, etc, my guess is that the default CD strategy causes quite a lot of these (including unnecessary ones) which might slow things down.

ginagr commented 1 year ago

I'm seeing something similar in my Angular app where I'm using Bootstrap instead of Angular Material for my tabs. I have a tab that has a decently large table of data (~500+ rows) and when Replay is enabled and I scroll down/up quickly it completely breaks the UI (text/buttons/content randomly disappear) and the page won't respond to the point of chrome eventually crashing. When I turn replay off, the table works fine.

I've played around with the amount of rows that will break the UI and it looks like it starts to become unresponsive at around 200 rows

jpike88 commented 1 year ago

Replay is always going to add overhead to your app

Our angular app has dynamic forms and other complex UI/DOM stuff going on as well.

However, in the past we have used other replay-like products in the past including hotjar, logrocket not to mention a few more. This is the only one that is producing serious performance penalties to our angular app to the point of us having to disable it. So there is some sort of extra inefficiency built into it whether by design or not that is producing this overhead, and until such overhead is brought down to a manageable level we will not be able to use replay going forward.

Lms24 commented 1 year ago

Hi @jpike88, I'm sorry that you're experiencing this performance impact. I understand that this is a serious problem for you and we're working on improving performance. Our running theory is that the slowdown is caused by a spike in DOM mutations within a very short time period. We'll soon be able to detect these spikes to confirm our theory. Angular's default change detection might contribute to this spike, given that it is likely to update DOM elements that necessarily don't need to be updated.

We'll keep you posted on further improvements and developments in this area :)

github-actions[bot] commented 1 year ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox πŸ₯€

billyvg commented 1 year ago

To those following in the thread: in our newest sdk (version 7.46.0) - we will now add a breadcrumb to the replay when we detect > 1000 mutations. There are also two new experimental options:

In order to use it you'll have to initialize Replay like so:

new Replay({
  _experiments: {
    mutationBreadcrumbLimit: 500,
    mutationLimit: 1000,
  }
})

Please try it out and let us know how it works for you.

bruno-garcia commented 1 year ago

One thing we're considering here is to not just warn if we detect too many mutations but turn Replay off if we detect that's going to affect performance too much

bruno-garcia commented 1 year ago

@mebibou or anyone else in this thread experience slowdowns in the latest version of the SDK? We added a feature to gracefully downgrade in the event of excessive mutations

NicolasR commented 1 year ago

@mebibou or anyone else in this thread experience slowdowns in the latest version of the SDK? We added a feature to gracefully downgrade in the event of excessive mutations

Yes. Still experiencing slowdown. No mutation warning on the dashboard. We tried to use settings to limit it anyway but it doesn't change anything. It seems that having a tab-menu (whatever the amount of content) or use scroll with behaviour smooth increase the load for the app and it get unusable. Mostly noticeable on Android devices. When we disable completely replay, no problems.

bruno-garcia commented 1 year ago

@mebibou or anyone else in this thread experience slowdowns in the latest version of the SDK? We added a feature to gracefully downgrade in the event of excessive mutations

Yes. Still experiencing slowdown. No mutation warning on the dashboard. We tried to use settings to limit it anyway but it doesn't change anything. It seems that having a tab-menu (whatever the amount of content) or use scroll with behaviour smooth increase the load for the app and it get unusable. Mostly noticeable on Android devices. When we disable completely replay, no problems.

Is your website public? We would appreciate repros, so we can run a profiler and find bottlenecks

renan-r-santos commented 1 year ago

I have the same issue with @sentry/angular-ivy@7.60.0. I also added the limits below and the site still becomes unusable on Chrome (Mac and Windows confirmed).

      mutationLimit: 1000,
      mutationBreadcrumbLimit: 100,

Our site isn't public but I have a .cpuprofile that I could ask around if it's ok to share. Would that help? I could also run a profiler with different settings if it's needed.

NicolasR commented 1 year ago

Is your website public? We would appreciate repros, so we can run a profiler and find bottlenecks

Our site is not public. But we can do some tests if you need it.

billyvg commented 1 year ago

@renan-r-santos Yes a profile would help, a performance trace would be great as well - you can email it to me at billy@sentry.io to avoid posting it publicly.

We are currently working on upgrading the underlying rrweb library to 2.0 (which includes some performance improvements) before we attempt further performance changes. We're currently internally testing the new SDK and are hoping to release a beta in the next few weeks. Stay tuned!

billyvg commented 1 year ago

We've released version 7.67.0-beta.0 -- you can test this by upgrading your @sentry/ packages to that version. There are no required actions from you, but there is one breaking change in that we are no longer masking aria-label attribute by default. Please test it out and let us know if it helps.

renan-r-santos commented 1 year ago

Today I compared 7.66.0 against 7.67.0-beta.0 and unfortunately I didn't notice any difference. Both versions without Replay enabled take about 10 seconds to load and render a particular page, while with Replay enabled it takes several minutes, and with CPU at 100%.

What is more interesting is that it only happens on Chromium-based browsers (Chrome and Edge tested). I don't see the issue in Firefox and Safari. All of them tested using their latest version under MacOS 13.

I sent to your email address one performance trace that I captured using Chrome. If you need additional info, please let me know. Thanks

billyvg commented 1 year ago

Thanks @renan-r-santos, from looking at the profile provided, it looks like your issue is angular specific, and is seemingly unrelated to the # of dom mutations and/or processMutations (ref: https://github.com/getsentry/sentry-javascript/issues/8469). If anyone could provide a repro example for us, it would be a great help.

meriturva commented 1 year ago

We have the same issue and I think could be quite easy for us to prepare an example repo. is it still necessary? Thanks.

billyvg commented 1 year ago

@meriturva Yes, would be extremely helpful if you could prepare an example repo

meriturva commented 1 year ago

I have prepared a repo with a few components, What I found is that it is really difficult to reproduce "super slow" evidence with a demo app.

https://github.com/meriturva/sentry-replay-angular-slower

Here few gifs I have recorded to show that replay makes render slower: Without replay: withoutReplay With replay: withReplay

If you need access to a really slow site (when the replay is enabled) I could arrange private access to a staging site where it will be quite easy to show the difference.

mydea commented 1 year ago

So @Lms24 and I spent some time digging into the repro (thank you!) and one thing we figured out is that it does seem to be related to mutation observers, but not to some rrweb or replay specifics.

What we managed to do to kind of narrow this down, is this:

  1. Remove the replay integration (or disable sampling) - now it "works" / is fast
  2. Add this in the console:
    
    const obs = new MutationObserver(() => console.log('obs 1'));

obs.observe(document.documentElement, { attributes: true, characterData: true, childList: true, subtree: true, });

const obs2 = new MutationObserver(() => console.log('obs 2'));

obs2.observe(document.documentElement, { attributes: true, characterData: true, childList: true, subtree: true, });



--> Switch pages, now it is slow.

This seems to indicate that the mere _presence_ of a mutation observer slows the page down. The more observers you add (even if they do nothing!), the slower it gets. And you can even see, it's not like the observers are run that often (they _do_ run with a rather high number of mutations, though)

We also noticed that the slowdown is much more noticeable with the devtools open, vs. them closed πŸ€” 

If this is really the root problem, I'm afraid there is nothing we can do to speed this up - as this is totally unrelated to _doing_ anything based on the mutation observer, but literally just the act of setting one up, which is something we can't get around doing 😬 

Could you by chance verify/try this in your "proper" app:
* Disable replay
* Add one/or a few mutation observers like in the example above
* Is it now slow? or still fast?
Lms24 commented 1 year ago

Also worth noting that in addition to Francesco's observations, we found out that our SDK/Replay is not the only library that initializes MutationObservers in the repro app you provided: Also @progress/kendo-angular-editor initializes them twice to watch over the UI styles (I think) of the text editor. Given that there were four editors on the page, that already means at least 8 observers before Replay is even added into the mix.

You can verify this by making an unminified production build of your app (ng build --optimization=false) and searching for MutationObserver ocurrances in the bundle that's generated for your EditorDemoModule.

This might contribute to why the slowdown is so noticable in your case.

synapseses commented 1 year ago

Could you by chance verify/try this in your "proper" app:

  • Disable replay
  • Add one/or a few mutation observers like in the example above
  • Is it now slow? or still fast?

I am currently trialing sentry (specifically for the session replay) and am experiencing the same issue (in a large ng app). I received a breadcrumb for large number of mutations (~800), although based on the config value of 1000 in above comments I wouldn't expect to see the performance degradation that we are seeing.

Disabling replay and adding mutation observers as per above does not degrade the performance of my app, while enabling replay makes it unusable.

Any ideas?

Lms24 commented 1 year ago

@synapseses if you're able to provide us with a minimal reproduction example we're happy to look into it. Given that the cause seems to differ from @meriturva's case, we don't really have a concrete clue what could be causing this. Thanks!

getsantry[bot] commented 11 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox πŸ₯€

duncan-c commented 10 months ago

I have also been seeing this issue as I've been integrating Sentry into our Angular app. I've done some digging into it and I believe I've found the main cause of the performance hit (at least in our case).

It appears that rrweb is running a call to requestAnimationFrame on a loop (in a method called periodicallyClear), and in Angular requestAnimationFrame is patched so it triggers a change detection whenever it is invoked. This means change detections are happening repeatedly, which slows the app down massively.

I've managed to restore the performance of our app by disabling the patching of requestAnimationFrame, and I've shared the results below. But ideally the rrweb library should be using the non-patched version of RAF (as well as other patched methods like setTimeout, etc) to avoid unnecessary change detection.


Here's a screenshot of a profile from our app, when it's just sitting idly on a simple page for 4 seconds:

image

And here's a profile of the exact same page sitting for 4 seconds, but with the requestAnimationFrame NgZone patch disabled:

image


mydea commented 10 months ago

Hey, thank you for the investigative work! That sounds like it could very well be the culprit.

For reference, could you share how you disabled this patching in your app? Now, I wonder if there is a way to get the "unpatched" requestAnimationFrame method πŸ€” I fear we are not gonna be able to put angular-specific code everywhere in rrweb to avoid this πŸ€” Do you by chance know if there is a way to access the unpatched version? If so, we could think about adding a way to pass a "custom" requestAnimationFrame to rrweb to use instead of the global one, I guess πŸ€”

mydea commented 10 months ago

I opened a PR to hopefully fix this on rrweb side: https://github.com/getsentry/rrweb/pull/150

duncan-c commented 10 months ago

Great work with the fix @mydea!

Sorry I didn't reply in time. The way I disabled it was to globally disable the monkey-patching of the function by Angular, because we didn't need change detection to automatically run whenever RAF runs - but obviously that's not a feasible solution for many people.

The iframe technique is clever, and will also mean that it'll work for other libraries that override the native implementation πŸ’ͺ

Sadly the performance is still not great on our application, but I did realise that we were using an outdated version of the JS SDK because we use the Capacitor plugin which is locked to v7.81 - and when I manually overrode this to get the latest version things did improve a bit, because of the rrweb improvements. We're seeing a lot of mutation limit warnings in the breadcrumbs - so I guess the main issue is that Angular is pretty heavy on the dom mutations, unless you're really careful with change detection, etc.

mydea commented 10 months ago

Thanks for the info! yeah, updating to a more recent version of replay should improve performance a bit. we will work on further improvements down the line too! But it will be great to unblock this angular specific issue here πŸ™

AbhiPrasad commented 10 months ago

The fix for this was released with https://github.com/getsentry/sentry-javascript/releases/tag/7.93.0 - please give it a try!