Performance report on Akka.Persistence.Sql vs Akka.Persistence.SqlServer

lucavice commented 4 months ago

Hi all,

as promised here, I'm sharing a bit of results of load testing on our system with the aim of comparing performance on our migration from the old Akka.Persistence.SqlServer plugin to Akka.Persistence.Sql.

We performed the testing on Akka v1.5.26 on Azure using Azure Sql Database (set to 50 DTUs, which is fairly low in the scale). We did not use the optional Tag Table performance improvement in this test. The test involves writing into the journal about 2000 events in a few seconds, which then need to be read by about 35 different event processors (which are mostly projections in our event-sourced system), which use EventsByTag to get access to the stream of events.

Run 1: `Akka.Persistence.SqlServer`

In this run using Akka.Persistence.SqlServer, the database capacity was saturated for almost 10 minutes. Even though the actual writing of the events was over within a few seconds, the journal reading seemed to take a tremendous toll on the resouces. All the projections were significantly delayed.

By looking at Azure Performance Insights, we can get some hints in what was the most expensive query. We can see that during the test, Azure marked query 25925 aggregate execution time for almost 4 hours. Although the insights functionality does not let me narrow down exactly to only the test, we can see from the graph how most of the 4 hours execution was spent during the test duration.

Query 25925 is the following:

(@Ordering bigint,@Tag_sized nvarchar(100),@Take bigint)SELECT TOP (@Take)
                e.PersistenceId as PersistenceId, 
                e.SequenceNr as SequenceNr, 
                e.Timestamp as Timestamp, 
                e.IsDeleted as IsDeleted, 
                e.Manifest as Manifest, 
                e.Payload as Payload,
                e.SerializerId as SerializerId, e.Ordering as Ordering
             FROM dbo.EventJournal e
             WHERE e.Ordering > @Ordering AND e.Tags LIKE @Tag_sized
             ORDER BY Ordering ASC

So we see that this query is the one generated by EventsByTag for scanning the journal by tag.

Run 2: `Akka.Persistence.Sql`

We repeated the same test, using Akka.Persistence.Sql instead. The outcome is completely different, as resources briefly peak during the minute the test executes, but then goes back to normal.

Digging into the performance insights, this time we see query 43928 being the most resource-intensive, but total execution for a similar range as previous test that includes the test is only 42 seconds, instead of 4 hours. We also see how there is a bit of a peak of usage during the test, but overall not much more than the baseline when the system is idle (the system still polls the journal every second for new events, even if there are none).

Query 43928 is the following:

(@Tag nvarchar(100),@Offset bigint,@MaxOffset bigint,@take int)SELECT TOP (@take)
    [r].[Ordering],
    [r].[Timestamp],
    [r].[IsDeleted],
    [r].[PersistenceId],
    [r].[SequenceNr],
    [r].[Payload],
    [r].[Tags],
    [r].[Manifest],
    [r].[SerializerId]
FROM
    [dbo].[EventJournal] [r]
WHERE
    [r].[Tags] LIKE @Tag ESCAPE N'~' AND
    [r].[IsDeleted] = 0 AND
    [r].[Ordering] > @Offset AND
    [r].[Ordering] <= @MaxOffset
ORDER BY
    [r].[Ordering]

The query looks fairly similar to the previous run and I can't quite tell why it is so much more performant than the previous one. The number of executions seem to be comparable. I'm starting to suspect this is due to the Akka.Persistence.Query backpressure improvements in 1.5, rather than this plugin in itself. But I'm not sure, shouldn't I have seen improvements for the Akka v1.5 tests with Akka.Persistence.SqlServer if the Akka.Persistence.Query improvements are separate from the plugin?

Conclusion

The performance benefit for our reads is just ludicrous, a 350x reduction in aggregated execution time. Perhaps the way we coded our event processors using EventsByTag wasn't ideal to start with, but it seems that our use case was just perfect for the type of improvements that this new plugin brings (or the Akka.Persistence.Query improvements). Luckily, we don't normally write thousands of events in short burst in production, so that wasn't an issue in our platform, but this definitely gives us peace of mind that we have a much bigger room for volume growth than before. We originally planned for enabling the Tag Table, but at the moment the new performance seem to be more than adeguate even without that feature.

Aaronontheweb commented 4 months ago

total execution for a similar range as previous test that includes the test is only 42 seconds, instead of 4 hours.

that is pretty amazing

Aaronontheweb commented 4 months ago

shouldn't I have seen improvements for the Akka v1.5 tests with Akka.Persistence.SqlServer if the Akka.Persistence.Query improvements are separate from the plugin?

yes you would have, so I suspect this has more to do with the way Akka.Persistence.Sql reads the data than it does with the number of queries being executed.

Aaronontheweb commented 4 months ago

I wonder how much faster this would be if you used the tag table

Aaronontheweb commented 4 months ago

I think the big impact might be how the Akka.Persistence.Sql implementation is bounding its queries more carefully:

    [r].[Ordering] > @Offset AND
    [r].[Ordering] <= @MaxOffset

versus

WHERE e.Ordering > @Ordering AND e.Tags LIKE @Tag_sized

The TAKE parameter is supposed to cap the number of results retrieved, but this subtle change in how that result set is bounded in the first place can have a big impact on the query execution plan.

to11mtm commented 4 months ago

I think the big impact might be how the Akka.Persistence.Sql implementation is bounding its queries more carefully:

agreed, also as a 'yes and' I will note is in Akka.Persistence.Sql we prefer pulling rows into batches with binary data to deserialize in another stage, versus Akka.Persistence.Sql.Common's behavior (at least last I knew) of deserializing right from the DbDataReader. The akka streams implementation gets us lots of code and data cache locality benefits, at the expense of memory alloc.

Additionally, there's the side-benefit that due to the 'copy-to-buffer-object' behavior that the query itself is open shorter, easing stress on connection pools and overall network traffic.

lucavice commented 3 months ago

Well, well done @to11mtm . It's crazy that the impact is so big, and it's not even an explicitly documented improvement, just a side effect of coding things in a different, more efficient way.

akkadotnet / Akka.Persistence.Sql