Open frankmcsherry opened 1 year ago
I came across this blog post: https://dbmsmusings.blogspot.com/2019/06/correctness-anomalies-under.html?m=1, which refers to this type of isolation level/consistency as "Strong Session Serializable". That could be useful when we are trying to come up with a name.
Wanted to braindump a strawman UX design ahead of our meeting today. I'm going to redefine some terms so the terms most clearly map to the concepts in my brain—which is often exactly at odds to what the codebase calls them today. Apologies for any confusion this causes.
A time domain indicates the semantic type of the timestamp. This is different from the Rust type used to store values in the time domain. Examples of time domains are "epoch milliseconds" and "PostgreSQL LSN" (in a hypothetical world where we don't force reclock all PostgreSQL sources). Multiple time domains may use the same Rust type. For example, EpochMillis
and PostgresLsn
both use i64
as their Rust type, but represent different time domains. We'd also want to distinguish between PostgreSQL LSNs from different PostgreSQL servers.
This corresponds to what the codebase calls enum Timeline
today, and what Frank calls "collection timelines" (CTs) in the issue description.
A timeline is a session timeline (ST) as described by Frank in the issue description.
A transaction read set is the set of collections upon which a transaction has a read hold.
The automatic transaction read set is the set that Materialize automatically selects when a BEGIN
command is issued if the user does not explicitly specify a read set. The automatic transaction read set varies based on the timeline in use.
This corresponds to what the codebase calls a "time domain" today.
-- Creates a new time domain backed by the specified type.
--
-- Time domains are global objects and do not belong to a particular database or schema.
-- There is a default time domain called `mz_epoch_ms` or somesuch.
CREATE TIME DOMAIN <ident> (TYPE <type>);
-- Creates a new timeline associated with the specified time domain.
--
-- `...` represents additional options that configure policies for the time domain, like
-- exchanging read latency and write latency.
--
-- If `TEMPORARY` is specified, the timeline is only visible to the session that creates
-- it.
--
-- There is a default timeline called `mz_system`.
CREATE [TEMPORARY] TIMELINE <ident> (TIME DOMAIN <name>, ...);
-- Set the timeline in use for the session.
--
-- The default value is `mz_system`.
SET timeline = '<name>';
-- Starts a transaction.
--
-- If `ACQUIRE READ HOLDS` is specified, acquires read holds on the named objects.
-- Otherwise, acquires read holds on the automatic transaction read set for the timeline.
BEGIN [ACQUIRE READ HOLDS (<name>, <name>)];
Roughly:
mz_system
, which gives them strict serializability.CREATE TEMPORARY TIMELINE my_timeline (TIME DOMAIN mz_epoch_millis);
SET timeline = my_timeline;
SET timeline = 'shared_timeline'
.I had a slightly different UX design in mind when thinking about session timelines. Potentially worse than your design, but I thought I'd share anyway. I'll use the same definition of terms that you defined above. The design currently assumes that EpochMillis
is the only valid time domain, though I think it can be extended to remove this assumption.
-- Sets the isolation level to strict serializable. This behaves the same as it does today.
-- All operations executed in this isolation level are linearizable with respect to all other
-- operations (except for source writes).
SET transaction_isolation = 'strict serializable';
-- Sets the isolation level to strong session serializable. All operations executed in this
-- isolation level are linearizable with respect to other operations executed in the same
-- isolation level with the same timeline named <name>.
--
-- Omitting <name> will put the session in a temporary timeline, where operations are
-- linearizable with respect only to other operations executed within the same session.
SET transaction_isolation = 'strong session serializable <name>';
One of the benefits is that the user doesn't need to understand timelines or be introduced to a new concept.
I'm going to quickly define a simplified timestamp oracle (TO) with the following two functions:
ts()
: returns a timestamp greater than or equal to all previous returned timestamps.finalize(t)
: guarantees that all timestamps returned in the future are greater than or equal to t
.From an implementation point of view I think the design above can be implemented as follows:
ts = max(now(), GTO.ts())
GTO.finalize(ts)
strong session serializable <name>
:
ts = min(GTO.ts(), max(serializable_ts, TTO.ts()))
serializable_ts
is the timestamp we would have picked in serializable mode. I.e. a timestamp that is as fresh as possible while minimizing timestamp introduced blocking. min(GTO.ts()
for now. It is needed to ensure that strict serializable operations remain linearizable with strong session serializable operations. Alternatively it can be replaced with a call to GTO.finalize(ts)
.TTO.finalize(ts)
One of the benefits is that the user doesn't need to understand timelines or be introduced to a new concept.
One easy way to mash up our designs is to say:
transaction_isolation
are serializable
, strong session serializable
, and strict serializable
. Default is strict serializable
.timeline
variable (perhaps it should be called session_timeline
) only applies when your transaction isolation level is strong session serializable
. Default is mz_temp
, which is a temporary timeline for your session only.This feels more SQL-y to me than smushing the timeline name into the transaction_isolation
variable, while still insulating users from the concept of timelines until the last moment.
Feature request
Materialize currently assigns timestamp by way of a
TimestampOracle
associated with what I'll call a "Collection Timeline" (CT). Each collection is associated with at most one CT, and potentially (afaict) no CT. The timestamp oracle is used if your query can identify a CT, we are using Strict Serializability, and you have not indicated anAS OF
.For the moment, there is essentially one CT, identified by
Timeline::EpochMilliseconds
. You can toggle participation mainly by the strictness of your serializability.There are some other potentially unintended side-effects surrounding e.g. constant views (
SELECT 1
) and maybe weirder things (create materialized view bar as select 1;
followed by the use ofbar
?). These are perhaps bugs that can be fixed, and aren't fundamentally issues with CTs but happen to be points of confusion at the moment.I propose instead that we consider "Session Timelines" (STs), each corresponding to exactly one
TimestampOracle
, where each session participates in some set of STs. When a session needs to determine a timestamp, it consults the timestamp oracles of its STs, as well as other lower bounds, and picks a timestamp at least as large as that lower bound. Having picked a timestamp, it thenfast_forward
s each timestamp oracle to the chose time.I think this lines up more closely with the intended behaviors of the serializability levels. They are guarantees about collections of commands and responses to those commands, and are not (afaict) properties about the data these commands reference.
There is not large substantial distinction between CTs and STs when there is only one of them. Sessions would either participate in the
Timeline::EpochMilliseconds
ST (strict serializability) or not (serializability).Although similar to how the system works at the moment, I think it is much less ambiguous when it comes to collections that have mysterious timeline membership. For example, I suspect a reason that
SELECT 1
has no timeline membership is "you should be able to query that from any timeline" which seems fair. A different take is "this shouldn't influence your timestamp selection", which would be more clearly the case when we use STs to determine timestamps (this view would exert nosince
constraints, and potentially have noupper
opinions).STs open up several doors for robustness going forward.
While we have isolation between
computed
andstoraged
instances, to protect e.g.prod
fromdev
, we do not have isolation when it comes to timestamp selection. Ifprod
must be strictly serializable, then it is risky fordev
, or any other independent group, to also use strict serializability, as they may interfere and introduce latency/freshness trade-offs toprod
that it did not want. Independent STs would allow groups of uses to self-identify and peel off into their own isolated timestamp selection regimes. The only interference would be that different STs might hold back compaction differently.STs are a natural place to land policy about timestamp selection. It would be sane to have two timelines
fast
andfresh
, where the first one always picks timestamps usingsince
(prioritizing latency over freshness), and the second one usesupper
(prioritizing freshness over latency). These two don't need to interfere, and one can opt in to either timeline as suits their needs (or create a new one).STs are a natural form of session strict serializability. New users could be defaulted in to their own private ST, where they would not interfere with others. This wouldn't be strict serializability by default (put everyone in the same "global" timeline), but if we wanted to default folks into a more performant but still sane default, this checks out.
There are some potential liabilities:
We currently use CTs to maintain a
ReadHold
on all collections in the timeline. This ensures that all of these collections are valid at the current read timestamp (i.e. have not been compacted past it), and avoids self-inflicted latency due to over-eager compaction. We would probably need each ST to maintain aReadHold
on all collections as well, to provide the same effect. However, various STs could also opt out of this (as a function of their "policy").We use CTs as a basis for "what could you even hope to relate the times of". They allow us to stay sane when there might be multiple collections that should NOT be interacted with through the same ST. E.g. one collection whose timestamps are postgres LSNs and one that are milliseconds. We should prevent users from transiting these numbers through the same ST. Ideally they would have "different types" and it shouldn't typecheck to do this, but CTs serve as that distinction. I think we still want this distinction, but I don't think it needs to be coupled to how we enforce levels of consistency.
TimeDomain
is close to this concept, but is instead "which relations will we check out when you start a transaction" which is meant to be more fine-grained that "which collections could you possibly consider without a hard break in consistency".One potential migration path is to consider
Here the
Strict
variant does what we expect things to do today: if selected, identifies a timeline which we use to create some timestamps. Other variants would allow folks to opt out of thisStrict
timeline without opting in to the totally relaxedSERIALIZABLE
mode.I think where we do use CTs for "timeline validation" we might want to factor that out. We might want to shake up the names a bit for things also, but that mostly reflects a personal bias towards wanting the names to be something else. In any event, I think there are three concepts:
I'm interested in any comments folks have! In particular, I could imagine that I am missing some uses of existing types, and I don't want to bluster my way through that so much as sort out whether the "timeline" is the right way to capture a concept. If I've missed important things, that's a great reason not to barge forward!
Candidate acceptance criteria
We introduce session timelines as a way for users to control their
STRICT SERIALIZABLE
behavior with more flexibility / precision. We might alternately conclude that session timelines are an anti-feature and either don't solve an actual problem, or lead to new classes of problems. We might alternately conclude that there are other idioms / mechanisms to address the same class of problems.Tasks