Blog: How we built upgraded session replay storage (to blobby)

Summary

Write a short paragraph on what this article is about. If applicable, what's the opinion or point we want to make in this article?

Based on Ben’s notes on Mr Blobby, write a blog talking about our migration of session replay data from ClickHouse to S3 and the creation of Mr. Blobby

What is the right angle? Money saved, performance, technology. How we solve the write cost vs storage cost problem. See headline options

Where will it be published?

select any that apply

[x] Blog

[ ] Founders Hub

[ ] Newsletter

[ ] Product engineers Hub

[ ] Tutorials

[ ] Other (please specify)

Why type of article is this?

select any that apply

[ ] High intent (i.e. comparisons and similar)

[x] Brand / opinionated (how we work and why, etc.)

[ ] High-level guide (concepts, frameworks, ideas, etc.)

[ ] Low-level guide (step-by-step guide / tutorial)

[ ] Other (please specify)

Who is the primary audience?

select any that apply

[ ] Founders

[x] Engineers

[ ] Growth

[ ] Marketing

[x] HackerNews

[ ] Existing PostHog users

[x] Potential PostHog users

Headline options

suggest a few angles

How migrating replay data to S3 saved our life

How we saved 50k/m moving replay data to S3

How we solved the write vs store cost challenge for our massive amount of replay data

Storing more, writing less: How moving replay data to S3 saved us 50k/m

Will it need custom art?

[ ] Yes
[x] No (We already have a Mr Blobby hog)

Outline (optional)

draft headings / questions you want to answer

We moved from ClickHouse-backed session replays to S3-backed ones

Problem 1: Store and query

ClickHouse is good at writing (batching), but not for storing this type of data.

We try to use ClickHouse for everything. Our old version of session replay used it too.

It was very slow to load blob-like data, it’s not the intended use case

Also, 3 weeks of replay data took up more than all the other data combined

Solution (AKA problem 2): Writing somewhere else

Move it somewhere else, obviously

We want to write many small packets and store a lot of content

This makes replay data a bad fit for both blob-style data and traditional databases.

Real solution: Buffering

Our SDKs batch session replay events to keep packets sent to a minimum

Buffer data on disk and write to blob storage once threshold has past

This reduces write costs and enables us to benefit from cheap S3 storage

Architecture of real solution: Mr. Blobby

Buffer incoming data to disk, use Node.js streams

Grouping it by session, finding or creating SessionManager

Add data to SessionManager write stream

Stream to gzip

See what needs to be flushed using Kafka age, real time age, and size

Flush to S3 via Kafka(?)

Querying and using this data

The data is the same and loading from S3 isn’t too much of a change

Store some metadata in ClickHouse to enable quick recordings query and joining with persons or events

In-flight sessions, people expect relatively real time viewing, using REdis

Buffer uncompressed version to disk

Web app publishes redis message to request real time session

Consumers receive the event and begin replicating

Benefits

This saved us 30-50k/month, improved ClickHouse health, and improved loading speed.

Along with the usability benefits for users, it also allows us to pass along savings in the form of better filters and longer term storage

PostHog / posthog.com

Blog: How we built upgraded session replay storage (to blobby) #8732