RailsEventStore / rails_event_store

A Ruby implementation of an Event Store based on Active Record
http://railseventstore.org
MIT License
1.41k stars 121 forks source link

Projection does three (3) queries where it could be doing one (1) #1708

Closed Bertg closed 9 months ago

Bertg commented 9 months ago

Given I have the Following projection:

RubyEventStore::Projection
  .from_stream("STREAM_NAME")
  .init(-> { { count: 0 } })
  .when(SomeEvent, ->(state, event) { state[:c] +=1 })
  .run(Rails.configuration.event_store)

Then I see the following queries:

RubyEventStore::ActiveRecord::EventInStream Load (2.7ms)  SELECT "event_store_events_in_streams".* FROM "event_store_events_in_streams" INNER JOIN "event_store_events" ON "event_store_events"."event_id" = "event_store_events_in_streams"."event_id" WHERE "event_store_events_in_streams"."stream" = $1 AND "event_store_events"."event_type" = $2 ORDER BY "event_store_events_in_streams"."id" ASC LIMIT $3  [["stream", "STREAM_NAME"], ["event_type", "SomeEvent"], ["LIMIT", 100]]
RubyEventStore::ActiveRecord::Event Load (0.5ms)  SELECT "event_store_events".* FROM "event_store_events" WHERE "event_store_events"."event_id" = $1  [["event_id", "9f75375d-3df3-410b-8f51-bdd33ba28912"]]
RubyEventStore::ActiveRecord::EventInStream Load (0.9ms)  SELECT "event_store_events_in_streams".* FROM "event_store_events_in_streams" INNER JOIN "event_store_events" ON "event_store_events"."event_id" = "event_store_events_in_streams"."event_id" WHERE "event_store_events_in_streams"."stream" = $1 AND "event_store_events"."event_type" = $2 AND (event_store_events_in_streams.id > 373) ORDER BY "event_store_events_in_streams"."id" ASC LIMIT $3  [["stream", "STREAM_NAME"], ["event_type", "SomeEvent"], ["LIMIT", 100]]

I identify these as follows:

  1. Fetch events based on the Projection conditions
  2. Fetch the events by id from the result of the previous condition
  3. Fetch the second page of events based on the Projection conditions

It is my contention that this should be 1 query.

The first two queries should be combined. There is no reason that this should be done in 2 steps. Either the data can be fetched in a the first query or the queries should be combined in some sort of subquery.

The third (last) query is not needed. We know the limit is 100, and we received less than 100 records from the database. There is no need to go to the database again.

For a trivial example like this, this optimisation can be overkill, however, we have situations where we do projections for streams with hundreds of events. For a stream with 499 events the current implementation will do 11 queries, while an optimised one would only do 5. We use Sentry, and it detects this as an N+1 issue even.

We are using:

porbas commented 9 months ago

Thank you for reporting this. I've looked into it and have some promising ideas. Stay tuned! :)

pjurewicz commented 9 months ago

@Bertg, these performance issues have been resolved in 2.13.0

Thank you again for your contribution!

Bertg commented 9 months ago

Wow! That was fast. Thanks!