Closed DCleymans closed 1 year ago
Hey @DCleymans 👋. Thanks for sharing insights. While Session Replay is in Beta we still work on improving the overall performance and this issue is something we will be definitely looking into.
In current state, the pressure on CPU highly links (basically O(n)
) to the number of views presented on the screen. More views == more processing on background thread. To better understand the situation in your app, could you share an approximate number of UIViews
your app displays while CPU pressure is high? Even sharing a screenshot (from app or replay) might be very handy.
When changing the interval in the
MainThreadScheduler
(...). But I suppose this leads to less accurate session replays?
If changing the interval
, the accuraccy of a single replay frame will be the same, but the frame-rate of a replay will be lower. In result, you'd experience less smooth Session Replay.
What can be done in order to lower the CPU load?
As for today, there isn't yet any configuration in public API that you could utilise to lower this impact. It might be best if we know the complexity of your UI, so we can implement more targeted optimisations on our side.
Hi @ncreated, thx for looking into this. I know session replay is brand new in the sdk and still in beta, but its a powerfull tool so we would like to re-enable it. (we currently disabled it due to the cpu load and also a memory issue, but i'm still looking into this issue).
This is a screenshot of our app:
It consists of a UISplitViewController, showing a UIViewController on the left (SidebarViewController) and a UITabbarController on the right (TabsViewController). The tabbarController is used to control the actual content.
The SideBarViewController consists of 36 views and subviews: The TabBarController constist of approximatly 150 views and subviews.
There is an animation on the screen, 3 bars that are moving up and down, indicating the item that is currenlty playing. This is implemented by showing 3 animated views. The music player can also animate the title/group label, if the text is too long to show on the screen (marqueelabel)
Thanks for more context @DCleymans! The overall number of views doesn't sound large at all - ~200
is definitely within expected range 👍, so the problem is rather somewhere else.
There is an animation on the screen, 3 bars that are moving up and down, indicating the item that is currenlty playing. This is implemented by showing 3 animated views.
Continuous view-tree mutations might bring some impact, but 3 animated views sound negligible.
To move forward, let me summarise a bit how our SR works, so I can ask follow-up question with the full context:
~100ms
on main thread, SR traverses the view-tree and captures immutable view representations for safe processing on background thread.com.datadoghq.*.processor
queue) it performs few procedures to optimise the payload:
The high CPU pressure you're observing is likely coming from these 3 procedures in processor
queue.
Datadog.verbosityLevel = .debug
, do you see any related logs in the console (other than data upload statuses)?Again, thanks for sharing feedback and helping us on this! Given diversity of apps and platforms, we strongly seek for feedbacks to stere SR development and optimisations.
The images are loaded into our app from a server. They are shown as 'content image' placeholders in our replay session. We use KingFisher (https://github.com/onevcat/Kingfisher) to load and cache them: The are loaded once from the server and stored to disk for later re-use across multiple sessions. In memory, the size of the image is reduced to lower the used memory: `let url = URL(string: imageUrl)
channelImage.kf.setImage(with: url, options: [ .transition(.fade(0.25)), .processor(DownsamplingImageProcessor(size: channelImage.frame.size)), .scaleFactor(UIScreen.main.scale), .cacheOriginalImage, ] ){ result in //... just logging code (removed) }`
Even when we are showing a rather simple screen, with no server side loaded images, we have a high cpu load:
I do see a warning in xcode that the content size of the scrollView is ambiguous. We have such a warning in all our screens and i will look into this issue. This might be an problem for step 1 and 2 of the processor thread?
Logging from the datadog sdk looks ok to me. Nothing special, a snaphot when filtering on datadog:
[DATADOG SDK] 🐶 → 12:28:28.254 → (rum) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: F85E588E-D27C-4282-B5CC-F776EB3D8907] [DATADOG SDK] 🐶 → 12:28:29.104 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:29.345 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 50E8C128-0DFD-45B0-8371-6648C1CB3A00] [DATADOG SDK] 🐶 → 12:28:29.964 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:31.057 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:31.354 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: BA2AB3DA-66B9-4CBD-8F0B-774D77E0D12D] [DATADOG SDK] 🐶 → 12:28:31.964 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:33.109 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:33.373 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 267ABBA7-5EDD-405B-99B2-1B8385B15689] [DATADOG SDK] 🐶 → 12:28:33.984 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:35.038 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:35.274 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 088132BE-26E0-4ADF-9176-49F12707D2A3] [DATADOG SDK] 🐶 → 12:28:35.883 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:36.964 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:37.147 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 71218D14-8389-4435-ADCA-12C147E0BD99] [DATADOG SDK] 🐶 → 12:28:37.779 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:38.836 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:39.052 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: DCA95C47-312E-4035-827C-DAB861DD3797] [DATADOG SDK] 🐶 → 12:28:39.668 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:40.761 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:40.988 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 0E313C4C-5664-42F9-919C-8454A2D21B61] [DATADOG SDK] 🐶 → 12:28:41.592 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:42.650 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:44.491 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:44.723 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 8DC4D435-CE6F-45EA-9B9E-7B34BF438D84] [DATADOG SDK] 🐶 → 12:28:45.332 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:45.527 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: F2FB342F-2DD0-4F5E-886A-BC9E8BA00C49] [DATADOG SDK] 🐶 → 12:28:46.181 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:46.482 💡 (rum) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:47.280 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:47.523 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: E7FE7C0F-EFDD-43BC-9E9C-CF3300EE5A6F] [DATADOG SDK] 🐶 → 12:28:48.146 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:49.280 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:49.524 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: FDE3512A-8304-45A7-8F42-34F9DE656F3C] [DATADOG SDK] 🐶 → 12:28:50.178 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:51.279 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:51.541 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: E59D8C68-0D7C-47C7-8C3A-B24DEAF1B6FB] [DATADOG SDK] 🐶 → 12:28:52.144 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:53.276 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:53.514 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 3EC020C4-4A92-4BF6-A568-7B6DC2C187A6] [DATADOG SDK] 🐶 → 12:28:54.142 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅
About the FPS, when profiling the app in instruments, there is a consistent frame time of 16.66ms in the display instrument. So no frame drops. Or is there some better way to measure fps?
I do see a warning in xcode that the content size of the scrollView is ambiguous. We have such a warning in all our screens and i will look into this issue. This might be an problem for step 1 and 2 of the processor thread?
Sizing warnings have been resolved.. cpu problem remains.
Thanks for more context @DCleymans. There is nothing alerting in the log indeed.
We acknowledge the problem and will be working on it on our side. To better pinpoint the issue you could try disabling all images in the app or toggle on / off animations (e.g. the marqueelabel you mentioned). There is no API in SR that could help with this and we will investigate it on our end and get back to you in this thread.
Hey @DCleymans,
Good news! We managed to fix and release the issue in the most recent https://github.com/DataDog/dd-sdk-ios/releases/tag/2.5.0
It was related to diffing algorithm bottleneck where it was using base64 field to calculate hash of the object. Hard one to catch! We replaced it with resource identifier and it improved the speed to this step by 10x.
You should still expect a little bit of CPU work in idle (around 10-15%), but we have plans to improve things further.
Using dd-sdk-ios develop branch (same results when using release 2.2.1 and release 2.1.1).
When we enable SessionReplay in our iOS app, the cpu load goes over 100% continuously. Our app consists of a splitViewController and can show different views as the second ViewController of the splitViewController. Some views have more content than others. When switching to a screen with less content, the CPU load drops, but never below 50%.
CPU load when showing a view with complex content:
CPU load when showing a view with simple content:
When pausing the running app, it looks like it is mostly doing work in a datadog thread: Thread 139 Queue : com.datadoghq.session-replay.processor (serial)
The cpu load drops to almost 0 when the app goes to the background. The app remains active, since it plays music when in background mode.
When changing the interval in the MainThreadScheduler to a larger value the cpu loads also drops. For example, change
let scheduler = MainThreadScheduler(interval: 0.1)
tolet scheduler = MainThreadScheduler(interval: 1)
the cpu load already drops to 15%.But I suppose this leads to less accurate session replays? What can be done in order to lower the CPU load?