bensheldon / spectator_sport

Record and replay browser sessions in a self-hosted Rails Engine.
https://spectator-sport-demo-1ca285490d99.herokuapp.com/
MIT License
156 stars 3 forks source link

Thoughts on data storage size #21

Open bensheldon opened 1 month ago

bensheldon commented 1 month ago

I've been using Spectator Sport on one website. These are the numbers:

> SpectatorSport::Session.count
=> 20308
> SpectatorSport::SessionWindow.count
=> 44050
> SpectatorSport::Event.count
=> 5662868

Which looks like this in the database:

Relation Size
spectator_sport_events 14.2GB
index_spectator_sport_events_on_session_id_and_created_at UNUSED 265MB
spectator_sport_events_pkey 121MB
index_spectator_sport_events_on_session_id 53.9MB
index_spectator_sport_events_on_session_window_id 52.3MB
spectator_sport_session_windows 5.26MB
spectator_sport_sessions 2.2MB
index_spectator_sport_sessions_on_secure_id_and_created_at 2MB
spectator_sport_session_windows_pkey 984KB
index_spectator_sport_session_windows_on_session_id 776KB
spectator_sport_sessions_pkey 464KB
nathancolgate commented 3 weeks ago

Hi @bensheldon !

What an incredible library. Thanks for bringing it over to the Rails world.

FWIW: I'm handling this over in my application by conditionally dialing back the amount of event information being reported. There are a lot of options here, but for starters:

# events_controller.rb
def index
  @config_object = if current_user.watch_like_a_hawk?
    {}
  else
    {
      blockSelector: "html *",
      sampling: {
        mousemove: false,
        mouseInteraction: false,
        scroll: 10000,
        media: 10000,
        input: false
      }
    }
  end
end

And then in my index.js.erb:

if (!this.stopRrwebCallback) {
  const functionObj = {
    emit: this.events.add.bind(this.events)
  }
  const configObj = <%= @config_object.to_json.html_safe %>;
  const combinedObj = {
    ...functionObj,
    ...configObj,
  };
  this.stopRrwebCallback = rrwebRecord(combinedObj);
}

And the typical event_data payload goes down to:

{"data": {"x": 0, "y": 0, "id": 1, "source": 3}, "type": 3, "timestamp": 1729948909322}

Which is good by itself, but if you also conditionally increase POST_INTERVAL_SECONDS as well: really cuts down on the data storage size.