DataDog / dd-sdk-ios

Datadog SDK for iOS - Swift and Objective-C.
Apache License 2.0
221 stars 128 forks source link

SessionReplay on iPad leads to high CPU pressure #1482

Closed DCleymans closed 1 year ago

DCleymans commented 1 year ago

Using dd-sdk-ios develop branch (same results when using release 2.2.1 and release 2.1.1).

When we enable SessionReplay in our iOS app, the cpu load goes over 100% continuously. Our app consists of a splitViewController and can show different views as the second ViewController of the splitViewController. Some views have more content than others. When switching to a screen with less content, the CPU load drops, but never below 50%.

CPU load when showing a view with complex content: Scherm­afbeelding 2023-09-17 om 10 52 38

CPU load when showing a view with simple content: Scherm­afbeelding 2023-09-17 om 10 54 13

When pausing the running app, it looks like it is mostly doing work in a datadog thread: Thread 139 Queue : com.datadoghq.session-replay.processor (serial) Scherm­afbeelding 2023-09-17 om 10 27 00

The cpu load drops to almost 0 when the app goes to the background. The app remains active, since it plays music when in background mode.

When changing the interval in the MainThreadScheduler to a larger value the cpu loads also drops. For example, change let scheduler = MainThreadScheduler(interval: 0.1) to let scheduler = MainThreadScheduler(interval: 1) the cpu load already drops to 15%.

But I suppose this leads to less accurate session replays? What can be done in order to lower the CPU load?

ncreated commented 1 year ago

Hey @DCleymans 👋. Thanks for sharing insights. While Session Replay is in Beta we still work on improving the overall performance and this issue is something we will be definitely looking into.

In current state, the pressure on CPU highly links (basically O(n)) to the number of views presented on the screen. More views == more processing on background thread. To better understand the situation in your app, could you share an approximate number of UIViews your app displays while CPU pressure is high? Even sharing a screenshot (from app or replay) might be very handy.

When changing the interval in the MainThreadScheduler (...). But I suppose this leads to less accurate session replays?

If changing the interval, the accuraccy of a single replay frame will be the same, but the frame-rate of a replay will be lower. In result, you'd experience less smooth Session Replay.

What can be done in order to lower the CPU load?

As for today, there isn't yet any configuration in public API that you could utilise to lower this impact. It might be best if we know the complexity of your UI, so we can implement more targeted optimisations on our side.

DCleymans commented 1 year ago

Hi @ncreated, thx for looking into this. I know session replay is brand new in the sdk and still in beta, but its a powerfull tool so we would like to re-enable it. (we currently disabled it due to the cpu load and also a memory issue, but i'm still looking into this issue).

This is a screenshot of our app: Simulator Screenshot - iPad mini (6th generation) - 2023-09-19 at 09 37 37

It consists of a UISplitViewController, showing a UIViewController on the left (SidebarViewController) and a UITabbarController on the right (TabsViewController). The tabbarController is used to control the actual content. Scherm­afbeelding 2023-09-19 om 09 46 52

The SideBarViewController consists of 36 views and subviews: Scherm­afbeelding 2023-09-19 om 09 48 13 The TabBarController constist of approximatly 150 views and subviews. Scherm­afbeelding 2023-09-19 om 09 52 33

There is an animation on the screen, 3 bars that are moving up and down, indicating the item that is currenlty playing. This is implemented by showing 3 animated views. The music player can also animate the title/group label, if the text is too long to show on the screen (marqueelabel)

ncreated commented 1 year ago

Thanks for more context @DCleymans! The overall number of views doesn't sound large at all - ~200 is definitely within expected range 👍, so the problem is rather somewhere else.

There is an animation on the screen, 3 bars that are moving up and down, indicating the item that is currenlty playing. This is implemented by showing 3 animated views.

Continuous view-tree mutations might bring some impact, but 3 animated views sound negligible.

To move forward, let me summarise a bit how our SR works, so I can ask follow-up question with the full context:

The high CPU pressure you're observing is likely coming from these 3 procedures in processor queue.

Again, thanks for sharing feedback and helping us on this! Given diversity of apps and platforms, we strongly seek for feedbacks to stere SR development and optimisations.

DCleymans commented 1 year ago

The images are loaded into our app from a server. They are shown as 'content image' placeholders in our replay session. We use KingFisher (https://github.com/onevcat/Kingfisher) to load and cache them: The are loaded once from the server and stored to disk for later re-use across multiple sessions. In memory, the size of the image is reduced to lower the used memory: `let url = URL(string: imageUrl)

channelImage.kf.setImage(with: url, options: [ .transition(.fade(0.25)), .processor(DownsamplingImageProcessor(size: channelImage.frame.size)), .scaleFactor(UIScreen.main.scale), .cacheOriginalImage, ] ){ result in //... just logging code (removed) }`

Even when we are showing a rather simple screen, with no server side loaded images, we have a high cpu load: Scherm­afbeelding 2023-09-19 om 12 14 39

I do see a warning in xcode that the content size of the scrollView is ambiguous. We have such a warning in all our screens and i will look into this issue. This might be an problem for step 1 and 2 of the processor thread?

Scherm­afbeelding 2023-09-19 om 12 15 01

Logging from the datadog sdk looks ok to me. Nothing special, a snaphot when filtering on datadog:

[DATADOG SDK] 🐶 → 12:28:28.254 → (rum) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: F85E588E-D27C-4282-B5CC-F776EB3D8907] [DATADOG SDK] 🐶 → 12:28:29.104 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:29.345 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 50E8C128-0DFD-45B0-8371-6648C1CB3A00] [DATADOG SDK] 🐶 → 12:28:29.964 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:31.057 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:31.354 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: BA2AB3DA-66B9-4CBD-8F0B-774D77E0D12D] [DATADOG SDK] 🐶 → 12:28:31.964 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:33.109 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:33.373 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 267ABBA7-5EDD-405B-99B2-1B8385B15689] [DATADOG SDK] 🐶 → 12:28:33.984 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:35.038 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:35.274 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 088132BE-26E0-4ADF-9176-49F12707D2A3] [DATADOG SDK] 🐶 → 12:28:35.883 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:36.964 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:37.147 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 71218D14-8389-4435-ADCA-12C147E0BD99] [DATADOG SDK] 🐶 → 12:28:37.779 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:38.836 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:39.052 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: DCA95C47-312E-4035-827C-DAB861DD3797] [DATADOG SDK] 🐶 → 12:28:39.668 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:40.761 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:40.988 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 0E313C4C-5664-42F9-919C-8454A2D21B61] [DATADOG SDK] 🐶 → 12:28:41.592 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:42.650 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:44.491 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:44.723 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 8DC4D435-CE6F-45EA-9B9E-7B34BF438D84] [DATADOG SDK] 🐶 → 12:28:45.332 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:45.527 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: F2FB342F-2DD0-4F5E-886A-BC9E8BA00C49] [DATADOG SDK] 🐶 → 12:28:46.181 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:46.482 💡 (rum) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:47.280 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:47.523 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: E7FE7C0F-EFDD-43BC-9E9C-CF3300EE5A6F] [DATADOG SDK] 🐶 → 12:28:48.146 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:49.280 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:49.524 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: FDE3512A-8304-45A7-8F42-34F9DE656F3C] [DATADOG SDK] 🐶 → 12:28:50.178 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:51.279 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:51.541 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: E59D8C68-0D7C-47C7-8C3A-B24DEAF1B6FB] [DATADOG SDK] 🐶 → 12:28:52.144 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅ [DATADOG SDK] 🐶 → 12:28:53.276 ⏳ (session-replay) Uploading batch... [DATADOG SDK] 🐶 → 12:28:53.514 → (session-replay) accepted, won't be retransmitted: [response code: 202 (accepted), request ID: 3EC020C4-4A92-4BF6-A568-7B6DC2C187A6] [DATADOG SDK] 🐶 → 12:28:54.142 💡 (session-replay) No upload. Batch to upload: NO, System conditions: ✅

About the FPS, when profiling the app in instruments, there is a consistent frame time of 16.66ms in the display instrument. So no frame drops. Or is there some better way to measure fps?

DCleymans commented 1 year ago

I do see a warning in xcode that the content size of the scrollView is ambiguous. We have such a warning in all our screens and i will look into this issue. This might be an problem for step 1 and 2 of the processor thread?

Sizing warnings have been resolved.. cpu problem remains.

ncreated commented 1 year ago

Thanks for more context @DCleymans. There is nothing alerting in the log indeed.

We acknowledge the problem and will be working on it on our side. To better pinpoint the issue you could try disabling all images in the app or toggle on / off animations (e.g. the marqueelabel you mentioned). There is no API in SR that could help with this and we will investigate it on our end and get back to you in this thread.

maciejburda commented 1 year ago

Hey @DCleymans,

Good news! We managed to fix and release the issue in the most recent https://github.com/DataDog/dd-sdk-ios/releases/tag/2.5.0

It was related to diffing algorithm bottleneck where it was using base64 field to calculate hash of the object. Hard one to catch! We replaced it with resource identifier and it improved the speed to this step by 10x.

You should still expect a little bit of CPU work in idle (around 10-15%), but we have plans to improve things further.