Primarily, decouples database transactions from the data source itself, in preparation for switching from tokio-postgres to SQLX.
Other significant changes:
Data source updates no longer require an exclusive lock, because the updates are made on a transaction, outside of the main data source, and then committed atomically. This has cascading effects particularly in the fetching notifier/notify_storage, and in tests. Should be a huge win for parallelism and reducing worst-case request latency, especially after we switch to connection pools
Breaks SQL storage into several smaller files/modules. We have wanted to do this for a while, and I found it almost unworkable to deal with such a big factor with so many compile errors in one 5k line file, so I just went ahead and did this. I will comment on the diffs where code has only been moved but not changed.
Adds read-only transactions, so now all DB accesses (reads and writes) use transactions. This lets us take advantage of postgres's strong transaction isolation features to get strong consistency semantics in the presence of concurrent transactions without using application locks
This PR does not:
Switch to sqlx or set up sqlite, but these features should be much more straightforward to do after this is merged.
Handle concurrent writes in any intelligent way. We will need to address this before merging. Input wanted. Basically, we now allow tasks to write to the database (via transactions) while other tasks are reading from it or even writing to it (via concurrent transactions). In general, any combination of a pruner run, async fetching task, decide event handler, merklized state update loop, and read-only query may be running at the same time and competing for access to the same data. Thus, we must ask ourselves what sort of consistency guarantees each database connection has.
By default, each Postgres transaction or statement (a naked statement is implicitly treated as a transaction as if it were wrapped in BEGIN ... COMMIT) has READ COMMITTED isolation from other transactions/statements. This means that any single statement in this transaction will only see data committed before the start of the statement, or in other words every statement gets a consistent view of the database. However, multiple statements in a transaction may get different views of the database, and thus business logic constraints that we expect to hold for different queries over related data may not hold, if another transaction commits in between two statements. Notably, sub-selects are considered separate statements. Also, if there are any places we make read-only queries in separates statements, these would be treated as separate transactions, and may read data that was modified in between.
I believe it is very difficult to reason about the correctness of our queries with these semantics, especially because we have a fairly complex schema with many relations between tables, and we routinely make complicated queries with multiple predicates, nested sub-selects, and so on. Thus, my recommendation is to use:
The SERIALIZABLE isolation level for modifying transactions. The main downside of using this strict isolation level is that a COMMIT may fail if a competing transaction committed first and there is no way to serialize the two transactions (typically happens when they modify the same data). This requires the transaction to be retried, but we already have retry logic in place everywhere, so this shouldn't be a problem. Moreover, it is actually extremely rare that we do have concurrent writers: the only tasks that frequently write are the event handler task and the merklized state task (which operates on separate data). Other tasks only run rarely (like the pruner and proactive fetcher) or in exceptional circumstances (like active fetch tasks when data is missing). Thus, I expect it will be very rare that a transaction actually reverts due to a serialization error.
The SERIALIZABLE READ ONLY DEFERRED isolation level for read-only transactions, explicitly wrapping all read-only queries in BEGIN/END so we get consistent results even for queries that include multiple statements (like the merklized state get_path queries). This has the same strong consistency guarantees as we have for writes, with the added benefit that the transaction will never fail with a serialization error (although it may block initially until a fully consistent snapshot is available to read from). Because writes are relatively infrequent (once every few seconds) I believe it will not be that often that one of these transactions actually has to block.
These isolation levels give us the strongest possible consistency guarantees for concurrent transactions: the results are always as if the transactions had each executed serially in some arbitrary order. Morever, I believe our application meets all the performance considerations recommended for using this mode (see the last paragraph of 13.2.3).
For the sake of completeness, I think there is an argument to be made for using READ COMMITTED or REPEATABLE READ. Basically, our database modifications typically only add data that wasn't there before (like inserting a new block or state snapshot). We don't modify existing data in place. Thus, it is hard to think of a scenario where consecutive queries reading from two different consistent database snapshots would actually give an invalid result. At the end of the day, though, I just don't think it's worth the stress to try and prove this, when we can just rely on the strong consistency of SERIALIZABLE unless/until it presents performance problems for us. Also worth noting that this argument kind of falls apart when you introduce pruning.
Anyways, I am looking for feedback on wheteher SERIALIZABLE does sound like the way to go, and I am also open to alternative approaches.
Key places to review:
I would look at everything, but I will comment on the diff where you can skip code that only moved locations, but didn't have significant changes. I recommend reviewing in the following "top-down" order:
Review the changes to NotifyStorage and Notifier, where the synchronization/locking is changed to support concurrent reads/writes and multiple writers as long as the underlying database does
Look at the rest of the fetching module (including sub-modules)
Finally, review the low-level implementation of the new transaction logic at the database layer
You can also review the transaction implementation for the file system backend, although it's less important because we should be getting rid of it very soon
Everything else should be either cut/pasted code or logical consequences of the above changes
See #271
This PR:
Primarily, decouples database transactions from the data source itself, in preparation for switching from tokio-postgres to SQLX.
Other significant changes:
This PR does not:
Handle concurrent writes in any intelligent way. We will need to address this before merging. Input wanted. Basically, we now allow tasks to write to the database (via transactions) while other tasks are reading from it or even writing to it (via concurrent transactions). In general, any combination of a pruner run, async fetching task, decide event handler, merklized state update loop, and read-only query may be running at the same time and competing for access to the same data. Thus, we must ask ourselves what sort of consistency guarantees each database connection has.
By default, each Postgres transaction or statement (a naked statement is implicitly treated as a transaction as if it were wrapped in
BEGIN ... COMMIT
) hasREAD COMMITTED
isolation from other transactions/statements. This means that any single statement in this transaction will only see data committed before the start of the statement, or in other words every statement gets a consistent view of the database. However, multiple statements in a transaction may get different views of the database, and thus business logic constraints that we expect to hold for different queries over related data may not hold, if another transaction commits in between two statements. Notably, sub-selects are considered separate statements. Also, if there are any places we make read-only queries in separates statements, these would be treated as separate transactions, and may read data that was modified in between.I believe it is very difficult to reason about the correctness of our queries with these semantics, especially because we have a fairly complex schema with many relations between tables, and we routinely make complicated queries with multiple predicates, nested sub-selects, and so on. Thus, my recommendation is to use:
SERIALIZABLE
isolation level for modifying transactions. The main downside of using this strict isolation level is that aCOMMIT
may fail if a competing transaction committed first and there is no way to serialize the two transactions (typically happens when they modify the same data). This requires the transaction to be retried, but we already have retry logic in place everywhere, so this shouldn't be a problem. Moreover, it is actually extremely rare that we do have concurrent writers: the only tasks that frequently write are the event handler task and the merklized state task (which operates on separate data). Other tasks only run rarely (like the pruner and proactive fetcher) or in exceptional circumstances (like active fetch tasks when data is missing). Thus, I expect it will be very rare that a transaction actually reverts due to a serialization error.SERIALIZABLE READ ONLY DEFERRED
isolation level for read-only transactions, explicitly wrapping all read-only queries inBEGIN
/END
so we get consistent results even for queries that include multiple statements (like the merklized stateget_path
queries). This has the same strong consistency guarantees as we have for writes, with the added benefit that the transaction will never fail with a serialization error (although it may block initially until a fully consistent snapshot is available to read from). Because writes are relatively infrequent (once every few seconds) I believe it will not be that often that one of these transactions actually has to block.These isolation levels give us the strongest possible consistency guarantees for concurrent transactions: the results are always as if the transactions had each executed serially in some arbitrary order. Morever, I believe our application meets all the performance considerations recommended for using this mode (see the last paragraph of 13.2.3).
For the sake of completeness, I think there is an argument to be made for using
READ COMMITTED
orREPEATABLE READ
. Basically, our database modifications typically only add data that wasn't there before (like inserting a new block or state snapshot). We don't modify existing data in place. Thus, it is hard to think of a scenario where consecutive queries reading from two different consistent database snapshots would actually give an invalid result. At the end of the day, though, I just don't think it's worth the stress to try and prove this, when we can just rely on the strong consistency ofSERIALIZABLE
unless/until it presents performance problems for us. Also worth noting that this argument kind of falls apart when you introduce pruning.Anyways, I am looking for feedback on wheteher
SERIALIZABLE
does sound like the way to go, and I am also open to alternative approaches.Key places to review:
I would look at everything, but I will comment on the diff where you can skip code that only moved locations, but didn't have significant changes. I recommend reviewing in the following "top-down" order:
NotifyStorage
andNotifier
, where the synchronization/locking is changed to support concurrent reads/writes and multiple writers as long as the underlying database does