getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.49k stars 4.11k forks source link

[Epic] Replay: Load replay data on-demand #66472

Open bruno-garcia opened 6 months ago

bruno-garcia commented 6 months ago

Problem

tl;dr: The way the Replay Details page loads the data can be problematic for customer using canvas recording. This is specially problematic for users that have huge canvas (4k), in replays that are long.

Also it's a problem for people on slow network connections, and is a blocker for us supporting longer replays.

Additional context

Currently, the Session Replay's breadcrumbs (including console, network, etc) and rrweb data used in the details page is served up by a single endpoint. To build the event timeline, as well as fast forward a replay to a specific point in time, we need all relevant data for the duration of the replay.

For that reason, basically we're loading all of the replay data (that can be 60 minutes long) all at once.

This approach worked out so far for the most part but has some disadvantages that became more significant with the introduction of canvas support. With the current version of the SDK, we record screenshots for the canvas element per second (2 fps). This can lead to the replay details page loading up to 7,200 images all at once (on a max length replay of 60 min, that's 3600 seconds X 2 canvas snapshots).

In order to improve the performance of the Replay playback, we need to be able to fetch the canvas snapshot on demand, as playback gets close to that point in time.

Worth noting that the rrweb recordings themselves need to be loaded from the start all the way to the point in time where we need to display (e.g: link to replay at the time of an error). But that's not the case with canvas snapshots.

cmanallen commented 6 months ago

Context: mobile replays send video events. These video events are fetched independently of the RRWeb recording data. However, they're also stored separately.

cmanallen commented 6 months ago

Option. On download call json.loads(downloaded_data), loop through the [events], extract the breadcrumb events and return, or extract the RRWeb events and return.

This would be backwards compatible but expensive (relative to: recv data -> stream to client).

cmanallen commented 6 months ago

Option: On upload, store breadcrumb events and RRWeb events independently of one another. On request lookup the {filename}-rrweb file or the {filename}-breadcrumbs file based on the endpoint or query parameter. This improves performance on the API (recv data -> stream to client). Older files would not be split and would either need a backwards compatible fallback (following the option above) or the client would have to fetch the legacy files and split them locally.

This doubles the number of files we need to upload which will have an impact on consumer throughput. We could adopt an approach similar to my hackweek project to optimize this.

bruno-garcia commented 6 months ago

We'll first investigate and confirm the root cause before going ahead with this change which is significant.

jas-kas commented 6 months ago

Example crashes - https://github.com/getsentry/team-replay/issues/399

bruno-garcia commented 6 months ago

Created a ticket specific to the investigation so we don't need ot commit to this solution: