Open bruno-garcia opened 6 months ago
Context: mobile replays send video events. These video events are fetched independently of the RRWeb recording data. However, they're also stored separately.
Option. On download call json.loads(downloaded_data)
, loop through the [events]
, extract the breadcrumb events and return, or extract the RRWeb events and return.
This would be backwards compatible but expensive (relative to: recv data -> stream to client).
Option: On upload, store breadcrumb events and RRWeb events independently of one another. On request lookup the {filename}-rrweb
file or the {filename}-breadcrumbs
file based on the endpoint or query parameter. This improves performance on the API (recv data -> stream to client). Older files would not be split and would either need a backwards compatible fallback (following the option above) or the client would have to fetch the legacy files and split them locally.
This doubles the number of files we need to upload which will have an impact on consumer throughput. We could adopt an approach similar to my hackweek project to optimize this.
We'll first investigate and confirm the root cause before going ahead with this change which is significant.
Example crashes - https://github.com/getsentry/team-replay/issues/399
Created a ticket specific to the investigation so we don't need ot commit to this solution:
Problem
tl;dr: The way the Replay Details page loads the data can be problematic for customer using
canvas
recording. This is specially problematic for users that have huge canvas (4k), in replays that are long.Also it's a problem for people on slow network connections, and is a blocker for us supporting longer replays.
Additional context
Currently, the Session Replay's breadcrumbs (including console, network, etc) and rrweb data used in the details page is served up by a single endpoint. To build the event timeline, as well as fast forward a replay to a specific point in time, we need all relevant data for the duration of the replay.
For that reason, basically we're loading all of the replay data (that can be 60 minutes long) all at once.
This approach worked out so far for the most part but has some disadvantages that became more significant with the introduction of
canvas
support. With the current version of the SDK, we record screenshots for thecanvas
element per second (2 fps). This can lead to the replay details page loading up to 7,200 images all at once (on a max length replay of 60 min, that's 3600 seconds X 2 canvas snapshots).In order to improve the performance of the Replay playback, we need to be able to fetch the
canvas
snapshot on demand, as playback gets close to that point in time.Worth noting that the rrweb recordings themselves need to be loaded from the start all the way to the point in time where we need to display (e.g: link to replay at the time of an error). But that's not the case with
canvas
snapshots.