The Pekko Persistence Postgres plugin allows for using PostgreSQL and Amazon Aurora databases as backend for Pekko Persistence and Pekko Persistence Query.
It’s been originally created as a fork of Akka Persistence JDBC plugin, focused on PostgreSQL features such as partitions, arrays, BRIN indexes and others. The first version was also based on Akka, and was named akka-persistence-postgres.
The main goal is to keep index size and memory consumption on a moderate level while being able to cope with an increasing data volume.
This plugin supports different schema variants for different use-cases: from small and simple apps, through the ones with a small, finite number of persistent actors but each with huge and still growing journals, to the services with an increasing number of unique persistent actors.
You can read more about DAOs and schema variants in the official documentation.
To use pekko-persistence-postgres
in your SBT project, add the following to your build.sbt
:
libraryDependencies += "com.swissborg" %% "pekko-persistence-postgres" % "0.8.0"
For a maven project add:
<dependency>
<groupId>com.swissborg</groupId>
<artifactId>pekko-persistence-postgres_2.13</artifactId>
<version>0.8.0</version>
</dependency>
to your pom.xml
.
To use this plugin instead of the default one, add the following to application.conf:
pekko.persistence {
journal.plugin = "postgres-journal"
snapshot-store.plugin = "postgres-snapshot-store"
}
and for persistence query:
PersistenceQuery(system).readJournalFor[PostgresReadJournal](PostgresReadJournal.Identifier)
:warning: Please note that this library is based on Pekko, but demo uses an older version of
akka-persistence-postgres
and there might be inconsistencies between the documentation and the provided code
This plugin has been re-designed in terms of handling very large journals.
The original plugin (pekko-persistence-jdbc) uses B-Tree indexes on three columns: ordering
, persistence_id
and sequence_number
. They are great in terms of the query performance and guarding column(s) data uniqueness, but they require relatively a lot of memory.
Wherever it makes sense, we decided to use more lightweight BRIN indexes.
Pekko-persistence-jdbc stores all tags in a single column as String separated by an arbitrary separator (by default it’s a comma character).
This solution is quite portable, but not perfect. Queries rely on the LIKE ‘%tag_name%
’ condition and some additional work needs to be done in order to filter out tags that don't fully match the input tag_name
(imagine a case when you have the following tags: healthy, unhealthy and neutral and want to find all events tagged with healthy. The query will return events tagged with both, healthy and unhealthy tags).
Postgres allows columns of a table to be defined as variable-length arrays. By mapping event tag names into unique numeric identifiers we could leverage intarray extension, which in some circumstances can improve query performance and reduce query costs up to 10x.
When you have big volumes of data and they keep growing, appending events to the journal becomes more expensive - indexes are growing together with tables.
Postgres allows you to split your data between smaller tables (logical partitions) and attach new partitions on demand. Partitioning also applies to indexes, so instead of a one huge B-Tree you can have a number of capped tables with smaller indexes.
You can read more on how Pekko Persistence Postgres leverages partitioning in the Supported journal schema variants section below.
Beside the aforementioned major changes we did some minor optimizations, like changing the column ordering for more efficient space utilization.
Currently, plugin supports two variants of the journal table schema: flat journal - a single table, similar to what the JDBC plugin provides. All events are appended to the table. Schema can be found here.
This is the default schema.
journal with nested partitions by persistenceId and sequenceNumber - this version allows you to shard your events by the persistenceId. Additionally each of the shards is split by sequenceNumber range to cap the indexes. You can find the schema here.
This variant is aimed for services that have a finite and/or small number of unique persistence aggregates, but each of them has a big journal.
In order to start using partitioned journal, you have to create either a partitioned table (here is the schema) and set the Journal DAO FQCN:
postgres-journal.dao = "org.apache.pekko.persistence.postgres.journal.dao.NestedPartitionsJournalDao"
The size of the nested partitions (sequence_number
’s range) can be changed by setting postgres-journal.tables.journal.partitions.size
. By default partition size is set to 10000000
(10M).
Partitions are automatically created by the plugin in advance. NestedPartitionsJournalDao
keeps track of created partitions and once sequence_number is out of the range for any known partitions, a new one is created.
Partitions follow the prefix_sanitizedPersistenceId_partitionNumber
naming pattern.
The prefix
can be configured by changing the posgres-journal.tables.journal.partitions.prefix
value. By default it’s set to j
.
sanitizedPersistenceId
is PersistenceId with all non-word characters replaced by _
.
partitionNumber
is the ordinal number of the partition for a given partition id.
Example partition names: j_myActor_0
, j_myActor_1
, j_worker_0
etc.
Keep in mind that the default maximum length for a table name in Postgres is 63 bytes, so you should avoid any non-ascii characters in your persistenceId
s and keep the prefix
reasonably short.
:warning: Once any of the partitioning setting under
postgres-journal.tables.journal.partitions
branch is settled, you should never change it. Otherwise you might end up with PostgresExceptions caused by table name or range conflicts.
Please see the documentation regarding migrations here.
We are also always looking for contributions and new ideas, so if you’d like to join the project, check out the open issues, or post your own suggestions!
Development and maintenance of pekko-persistence-postgres is sponsored by:
SoftwareMill is a software development and consulting company. We help clients scale their business through software. Our areas of expertise include backends, distributed systems, blockchain, machine learning and data analytics.
SwissBorg makes managing your crypto investment easy and helps control your wealth.