Closed mebibou closed 10 months ago
Hi @mebibou, thanks for writing in! I'm afraid, Replay is always going to add overhead to your app, as rrweb (the underlying library we use to record everything that's happening) just needs to listen and react to so many events, DOM mutations, style changes, etc.
We're right now working on performance overhead measurements, to better understand the overall impact. For instance, https://github.com/getsentry/sentry-javascript/pull/6611 introduced the first version of them. We're still trying to improve here, as we've noticed that results seem to fluctuate and depend a lot on the individual apps.
It's interesting that the impact is so noticeable in your app. Would you mind sharing a little information about it? For instance:
Lastly, I guess the thin you can try at the moment to mitigate the overhead would be to lower the sample rates of errors and sessions. If a session is not sampled (both entire sessions and errors), our integration doesn't do anything and there should be no noticable performance impact.
Hope this helps a little. LMK if you have more questions :)
@Lms24 thanks for the reply
Our app does have a lot of interaction, and one part of it where the lag is very noticeable is a place where we have a dynamic form, i.e. the user can add more components, add some text, sometimes clicking on a button will open a dialogue, etc. so based on ReactiveForms.
But even on simple pages where we just display a list of search results, the results took about 2-5 seconds to display compared to without the Replay you would barely see the loader.
I didn't really perform any measurements as users started to complain the website was too slow to use so I simply disabled it. I don't think sampling would help, as it would still fall randomly on some users to have a horrible experience?
An idea that I had though: is there any way to enable the integration, but have some way to turn it on/off at runtime? I am thinking in cases where the user has a bug, and it could "turn on recording" (with a disclaimer that everything will be slower when recording is on)?
Our app does have a lot of interaction, and one part of it where the lag is very noticeable is a place where we have a dynamic form, i.e. the user can add more components, add some text, sometimes clicking on a button will open a dialogue, etc. so based on ReactiveForms.
A wild guess here would be that maybe some validation logic is constantly updating the UI after an input? It's been a while since I played with Angular Reactive Forms but maybe the debounceTime
operator for observing user input helps?
An idea that I had though: is there any way to enable the integration, but have some way to turn it on/off at runtime? I am thinking in cases where the user has a bug, and it could "turn on recording" (with a disclaimer that everything will be slower when recording is on)?
Yes, you can start and stop recording manually. For instance, it might be possible to stop recording on these more complex pages.
Furthermore, only recording when errors happen is supported ootb by the Replay integration:
replaysSessionSampleRate: 0,
replaysOnErrorSampleRate: 1.0, // or a smaller value, depending on what you perefer:
Note though, at the beginning of a session, we make a sampling decision based on the sample rates. If the session is sampled for errors, we'll still record all the time but only keep the last 60s of recording in the buffer. Hence, performance degredation is probably still noticeable.
sorry missed that section in the readme, I will have a play with it when I can spend more time on this.
For instance, it might be possible to stop recording on these more complex pages.
Yes but unfortunately, the more complex the page, the more the need to record the sessions!
We do use debounceTime
and other performance related tricks, but when you've got hundreds of inputs and buttons etc on a page, eventually the memory usage increases. I imagined this would impact performances, but not by that much (most of the time the page was unresponsive after a few minutes)
@mebibou do you have a publicly accessible URL where this happens that we can use to test ourselves? I'm curious if it's form input related and if ignoring events on the inputs would help perf.
One more question: What kind of change detection strategy are you using?
We use Angular Material. In a tab group you can wrap a tab with matTabContent
to prevent it from initializing until you actually click on the tab. For tabs where we've done that, it appears Sentry replay causes it to hang badly. For example, the initial Audits tab here loads fine, clicking to Pending MR Review works fine, but clicking Incomplete/Approval needed hangs to the point the app is unusable. This worked fine before enabling replay, and adding/removing matTabContent
toggles the problem on and off.
<mat-tab-group (selectedTabChange)="onTabChanged($event);">
<mat-tab label="Audits">
<app-audit-list [dateRange]=dateRange></app-audit-list>
</mat-tab>
<mat-tab label="Pending MR Review">
<app-result-list [data]="mrPending"></app-result-list>
</mat-tab>
<mat-tab label="Incomplete/Approval Needed">
<ng-template matTabContent>
<app-result-list [data]="mrIncomplete"></app-result-list>
</ng-template>
</mat-tab>
</mat-tab-group>
We are using the default change detection strategy.
I'd also like to say replay is amazing. It's been a long time since software has delighted me this much. Kudos!
One more question: What kind of change detection strategy are you using?
We use the default in most places. We tried to use "OnPush" but we had many problems with components no re-rendering anything and as the application is quite complex, it would take too long to debug/rewrite everything so OnPush is applicable. Is that something that doesn't bode well with Replay?
Is that something that doesn't bode well with Replay?
Potentially, yes. Probably not the default CD strategy per sΓ© but the combination of it with complex apps, is my best guess. Generally, because rrweb records every single DOM mutation, style change, etc, my guess is that the default CD strategy causes quite a lot of these (including unnecessary ones) which might slow things down.
I'm seeing something similar in my Angular app where I'm using Bootstrap instead of Angular Material for my tabs. I have a tab that has a decently large table of data (~500+ rows) and when Replay is enabled and I scroll down/up quickly it completely breaks the UI (text/buttons/content randomly disappear) and the page won't respond to the point of chrome eventually crashing. When I turn replay off, the table works fine.
I've played around with the amount of rows that will break the UI and it looks like it starts to become unresponsive at around 200 rows
Replay is always going to add overhead to your app
Our angular app has dynamic forms and other complex UI/DOM stuff going on as well.
However, in the past we have used other replay-like products in the past including hotjar, logrocket not to mention a few more. This is the only one that is producing serious performance penalties to our angular app to the point of us having to disable it. So there is some sort of extra inefficiency built into it whether by design or not that is producing this overhead, and until such overhead is brought down to a manageable level we will not be able to use replay going forward.
Hi @jpike88, I'm sorry that you're experiencing this performance impact. I understand that this is a serious problem for you and we're working on improving performance. Our running theory is that the slowdown is caused by a spike in DOM mutations within a very short time period. We'll soon be able to detect these spikes to confirm our theory. Angular's default change detection might contribute to this spike, given that it is likely to update DOM elements that necessarily don't need to be updated.
We'll keep you posted on further improvements and developments in this area :)
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog
or Status: In Progress
, I will leave it alone ... forever!
"A weed is but an unloved flower." β Ella Wheeler Wilcox π₯
To those following in the thread: in our newest sdk (version 7.46.0) - we will now add a breadcrumb to the replay when we detect > 1000 mutations. There are also two new experimental options:
mutationBreadcrumbLimit
- changes the count of mutations before creating a new breadcrumbmutationLimit
- this option will instead take a new full snapshot of the entire DOM instead of processing each individual mutation in the queue. This should improve performance for your users, though you should still look into optimize your code to reduce the amount of DOM mutations!.In order to use it you'll have to initialize Replay
like so:
new Replay({
_experiments: {
mutationBreadcrumbLimit: 500,
mutationLimit: 1000,
}
})
Please try it out and let us know how it works for you.
One thing we're considering here is to not just warn if we detect too many mutations but turn Replay off if we detect that's going to affect performance too much
@mebibou or anyone else in this thread experience slowdowns in the latest version of the SDK? We added a feature to gracefully downgrade in the event of excessive mutations
@mebibou or anyone else in this thread experience slowdowns in the latest version of the SDK? We added a feature to gracefully downgrade in the event of excessive mutations
Yes. Still experiencing slowdown. No mutation warning on the dashboard. We tried to use settings to limit it anyway but it doesn't change anything. It seems that having a tab-menu (whatever the amount of content) or use scroll with behaviour smooth increase the load for the app and it get unusable. Mostly noticeable on Android devices. When we disable completely replay, no problems.
@mebibou or anyone else in this thread experience slowdowns in the latest version of the SDK? We added a feature to gracefully downgrade in the event of excessive mutations
Yes. Still experiencing slowdown. No mutation warning on the dashboard. We tried to use settings to limit it anyway but it doesn't change anything. It seems that having a tab-menu (whatever the amount of content) or use scroll with behaviour smooth increase the load for the app and it get unusable. Mostly noticeable on Android devices. When we disable completely replay, no problems.
Is your website public? We would appreciate repros, so we can run a profiler and find bottlenecks
I have the same issue with @sentry/angular-ivy@7.60.0
. I also added the limits below and the site still becomes unusable on Chrome (Mac and Windows confirmed).
mutationLimit: 1000,
mutationBreadcrumbLimit: 100,
Our site isn't public but I have a .cpuprofile
that I could ask around if it's ok to share. Would that help? I could also run a profiler with different settings if it's needed.
Is your website public? We would appreciate repros, so we can run a profiler and find bottlenecks
Our site is not public. But we can do some tests if you need it.
@renan-r-santos Yes a profile would help, a performance trace would be great as well - you can email it to me at billy@sentry.io to avoid posting it publicly.
We are currently working on upgrading the underlying rrweb library to 2.0 (which includes some performance improvements) before we attempt further performance changes. We're currently internally testing the new SDK and are hoping to release a beta in the next few weeks. Stay tuned!
We've released version 7.67.0-beta.0
-- you can test this by upgrading your @sentry/
packages to that version. There are no required actions from you, but there is one breaking change in that we are no longer masking aria-label
attribute by default. Please test it out and let us know if it helps.
Today I compared 7.66.0
against 7.67.0-beta.0
and unfortunately I didn't notice any difference. Both versions without Replay enabled take about 10 seconds to load and render a particular page, while with Replay enabled it takes several minutes, and with CPU at 100%.
What is more interesting is that it only happens on Chromium-based browsers (Chrome and Edge tested). I don't see the issue in Firefox and Safari. All of them tested using their latest version under MacOS 13.
I sent to your email address one performance trace that I captured using Chrome. If you need additional info, please let me know. Thanks
Thanks @renan-r-santos, from looking at the profile provided, it looks like your issue is angular specific, and is seemingly unrelated to the # of dom mutations and/or processMutations
(ref: https://github.com/getsentry/sentry-javascript/issues/8469). If anyone could provide a repro example for us, it would be a great help.
We have the same issue and I think could be quite easy for us to prepare an example repo. is it still necessary? Thanks.
@meriturva Yes, would be extremely helpful if you could prepare an example repo
I have prepared a repo with a few components, What I found is that it is really difficult to reproduce "super slow" evidence with a demo app.
https://github.com/meriturva/sentry-replay-angular-slower
Here few gifs I have recorded to show that replay makes render slower: Without replay: With replay:
If you need access to a really slow site (when the replay is enabled) I could arrange private access to a staging site where it will be quite easy to show the difference.
So @Lms24 and I spent some time digging into the repro (thank you!) and one thing we figured out is that it does seem to be related to mutation observers, but not to some rrweb or replay specifics.
What we managed to do to kind of narrow this down, is this:
const obs = new MutationObserver(() => console.log('obs 1'));
obs.observe(document.documentElement, { attributes: true, characterData: true, childList: true, subtree: true, });
const obs2 = new MutationObserver(() => console.log('obs 2'));
obs2.observe(document.documentElement, { attributes: true, characterData: true, childList: true, subtree: true, });
--> Switch pages, now it is slow.
This seems to indicate that the mere _presence_ of a mutation observer slows the page down. The more observers you add (even if they do nothing!), the slower it gets. And you can even see, it's not like the observers are run that often (they _do_ run with a rather high number of mutations, though)
We also noticed that the slowdown is much more noticeable with the devtools open, vs. them closed π€
If this is really the root problem, I'm afraid there is nothing we can do to speed this up - as this is totally unrelated to _doing_ anything based on the mutation observer, but literally just the act of setting one up, which is something we can't get around doing π¬
Could you by chance verify/try this in your "proper" app:
* Disable replay
* Add one/or a few mutation observers like in the example above
* Is it now slow? or still fast?
Also worth noting that in addition to Francesco's observations, we found out that our SDK/Replay is not the only library that initializes MutationObserver
s in the repro app you provided: Also @progress/kendo-angular-editor
initializes them twice to watch over the UI styles (I think) of the text editor. Given that there were four editors on the page, that already means at least 8 observers before Replay is even added into the mix.
You can verify this by making an unminified production build of your app (ng build --optimization=false
) and searching for MutationObserver
ocurrances in the bundle that's generated for your EditorDemoModule
.
This might contribute to why the slowdown is so noticable in your case.
Could you by chance verify/try this in your "proper" app:
- Disable replay
- Add one/or a few mutation observers like in the example above
- Is it now slow? or still fast?
I am currently trialing sentry (specifically for the session replay) and am experiencing the same issue (in a large ng app). I received a breadcrumb for large number of mutations (~800), although based on the config value of 1000 in above comments I wouldn't expect to see the performance degradation that we are seeing.
Disabling replay and adding mutation observers as per above does not degrade the performance of my app, while enabling replay makes it unusable.
Any ideas?
@synapseses if you're able to provide us with a minimal reproduction example we're happy to look into it. Given that the cause seems to differ from @meriturva's case, we don't really have a concrete clue what could be causing this. Thanks!
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community
, I will leave it alone ... forever!
"A weed is but an unloved flower." β Ella Wheeler Wilcox π₯
I have also been seeing this issue as I've been integrating Sentry into our Angular app. I've done some digging into it and I believe I've found the main cause of the performance hit (at least in our case).
It appears that rrweb is running a call to requestAnimationFrame
on a loop (in a method called periodicallyClear
), and in Angular requestAnimationFrame
is patched so it triggers a change detection whenever it is invoked. This means change detections are happening repeatedly, which slows the app down massively.
I've managed to restore the performance of our app by disabling the patching of requestAnimationFrame
, and I've shared the results below. But ideally the rrweb library should be using the non-patched version of RAF (as well as other patched methods like setTimeout
, etc) to avoid unnecessary change detection.
Here's a screenshot of a profile from our app, when it's just sitting idly on a simple page for 4 seconds:
And here's a profile of the exact same page sitting for 4 seconds, but with the requestAnimationFrame
NgZone patch disabled:
Hey, thank you for the investigative work! That sounds like it could very well be the culprit.
For reference, could you share how you disabled this patching in your app?
Now, I wonder if there is a way to get the "unpatched" requestAnimationFrame
method π€ I fear we are not gonna be able to put angular-specific code everywhere in rrweb to avoid this π€ Do you by chance know if there is a way to access the unpatched version? If so, we could think about adding a way to pass a "custom" requestAnimationFrame
to rrweb to use instead of the global one, I guess π€
I opened a PR to hopefully fix this on rrweb side: https://github.com/getsentry/rrweb/pull/150
Great work with the fix @mydea!
Sorry I didn't reply in time. The way I disabled it was to globally disable the monkey-patching of the function by Angular, because we didn't need change detection to automatically run whenever RAF runs - but obviously that's not a feasible solution for many people.
The iframe technique is clever, and will also mean that it'll work for other libraries that override the native implementation πͺ
Sadly the performance is still not great on our application, but I did realise that we were using an outdated version of the JS SDK because we use the Capacitor plugin which is locked to v7.81 - and when I manually overrode this to get the latest version things did improve a bit, because of the rrweb improvements. We're seeing a lot of mutation limit warnings in the breadcrumbs - so I guess the main issue is that Angular is pretty heavy on the dom mutations, unless you're really careful with change detection, etc.
Thanks for the info! yeah, updating to a more recent version of replay should improve performance a bit. we will work on further improvements down the line too! But it will be great to unblock this angular specific issue here π
The fix for this was released with https://github.com/getsentry/sentry-javascript/releases/tag/7.93.0 - please give it a try!
Is there an existing issue for this?
How do you use Sentry?
Sentry Saas (sentry.io)
Which package are you using?
@sentry/angular
SDK Version
7.31.1
Framework Version
Angular 14.2.8
Link to Sentry event
No response
SDK Setup
Steps to Reproduce
Add the Replay integration to the Angular app. Without it the website is smooth, as soon as I add it it gets super slow continuously
Expected Result
Add Replay shouldn't impact performance so much, or it's basically becomes unusable
Actual Result
Everything is slow