A better optimistic concurrency solution

In chapter 7, we implement a version number to provide optimistic concurrency controls around each aggregate. However, the version number doesn't actually do anything to prevent concurrency conflicts in itself. All of the behaviour comes from the "REPEATABLE READ" transaction isolation level. This is a feature of PostgreSQL that allows the DB to detect a conflict between two concurrent transactions, hence the serialization error raised by psycopg2. The version number is only checked in the test, but this is kind of superfluous (we should easily just check that the changes weren't made by the second transaction).

SqlAlchemy has some built-in functionality for handling optimistic concurrency using version numbers. This works by using the version number within any UPDATE or DELETE statements to ensure the version in the DB is the same as the version you initially read and made changes to. This supports "offline" concurrency conflicts - where you take a copy of the object out of the session, make changes to it (i.e. maybe within a UI app or other service), and then merge it into the DB. It can handle this because your copy still has its original version number that is compared against the DB. It also handles other situations where things might get out of sync, e.g. from the docs:

The purpose of this feature is to... provide a guard against the usage of a “stale” row in a system that might be re-using data from a previous transaction without refreshing (e.g. if one sets expire_on_commit=False with a Session, it is possible to re-use the data from a previous transaction).

I've put quite a few hours into it already. The problem I've found is the version number managed by SqlAlchemy is only bumped on the table where the INSERT/UPDATE/DELETE is happening. This means the version number that lives in the table that represents my root aggregate doesn't get bumped when only some child entity is updated. Effectively, every table would need its own version number. One solution might be to have a property on the aggregate that gets touched every update, e.g. a last_updated_at timestamp, which would force an update on the root entity. Thus, making the whole aggregate one consistent boundary. I don't really like this idea, though, as it could easily allow bugs where the last_updated_at property is not updated.

Just wondered if anyone has some insights into this problem. Are the version numbers worth adding (provided you don't need offline semantics or hit the other edge cases), or should you just set REPEATABLE READ and call it a day?

I've since learned quite a deal more about implementing concurrency with sqlalchemy. I was wrong in my previous assessment that the version number in chapter 7 doesn't do anything. In fact, it was staring me in the face:

The problem I've found is the version number managed by SqlAlchemy is only bumped on the table where the INSERT/UPDATE/DELETE is happening.

The version number is added in chapter 7 and incremented within the domain model to ensure a SQL UPDATE is always issued on the aggregate root table. This, in combination with READ REPEATABLE, gives us the optimistic locking behaviour. READ REPEATABLE tells PostgreSQL to detect and throw a SerializationFailure if it detects two concurrent transactions that are trying to change the same row. You need the version number on the aggregate root table to provide optimistic locking around the whole aggregate rather than just individual tables.

cosmicpython / code

A better optimistic concurrency solution #53