MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

Exchange write latency for read latency #13778

Open frankmcsherry opened 2 years ago

frankmcsherry commented 2 years ago

Feature request

Our approach to timestamp selection in ADAPTER has the opportunity to trade off between write latency and read latency.

At the moment, we have a single timestamp that increases. When writes are performed, they are performed at the current timestamp, which is then sealed. When reads are performed, they too are performed at the current timestamp, and must await the presentation of sealed data up through (and including) that timestamp.

This approach is "write-optimal" in that a write can always be applied at the very next timestamp. One can perform that write, and as soon as it is durable report it as such, as all subsequent reads will see this write (as they are at timestamps at least as large).

This approach is "read-pessimal" in that all reads must wait for the writes processed before it to become durable. This should make sense for writes "serialized" before it, but just because a write was received and processed before a read doesn't mean that it needs to be serialized before it.

Consider a different approach, where there are two timestamps, a read timestamp and a write timestamp, with the write timestamp greater than the read timestamp (by say a second). Writes are applied at the write timestamp, but do not immediately return; they await the read timestamp advancing to their write timestamp before returning. Reads are performed at the read timestamp, roughly one second before the write timestamp

The approach sacrifices write latency to benefit read latency. Reads occur and (in this example) the time at which they must read was sealed a second ago. That means we've given the system a second head start to make durable the writes associated with those times. On the other hand, writes have to sit around for a while, until the read timestamp advances (or until we force it forward and durably record this, sacrificing the lead that reads might have had).

There are potentially other trade-offs, but it is worth intentionally choosing whether Materialize should be optimized for reads or for writes. I suspect the former, if I had to choose, and that our design prioritizes the latter instead is something we should perhaps reconsider!

Candidate acceptance criteria

We choose from one of:

  1. Reads against "up-to-date" inputs can respond immediately in STRICT SERIALIZABLE mode. Writes in to tables can take unbounded time to be applied; source data can be unboundedly stale. (the "up-to-date" caveat is meant to exclude dataflows that are in the process of hydration, in that their [since, upper) intervals do not track those of their inputs).
  2. Writes in to tables can always be applied "promptly", and source data are always relatively fresh. Reads in STRICT SERIALIZABLE mode may take unboundedly long, as a function of the propagation latencies for dataflow updates and the pattern of queries users make.
  3. A clearly articulated subset of reads can respond immediately in STRICT SERIALIZABLE mode. For example, at the moment we can immediately respond to reads from adapter-maintained tables. Views over these tables, and reads from other sources have no such guarantee. In exchange, writes in to tables are relatively prompt (not indefinitely delayed).
jkosh44 commented 2 years ago

One potential optimization we could make is to start tracking the most recent write timestamp for every table. Then when selecting a read timestamp, any timestamp in the range [most_recent_write_ts, global_ts] would be valid without sacrificing strict serializability. We could prioritize selecting an already closed timestamp.

maddyblue commented 2 years ago

Currently to make the sales demo fast we need to 1) create an index and 2) disable SS. This issue might remove the need for us to tell demo users to downgrade their txn level, which is a pretty bad look for a database. I guess that means I'm for prioritizing this issue!

frankmcsherry commented 1 year ago

One potential optimization we could make is to start tracking the most recent write timestamp for every table. Then when selecting a read timestamp, any timestamp in the range [most_recent_write_ts, global_ts] would be valid without sacrificing strict serializability. We could prioritize selecting an already closed timestamp.

I think this is true only for views over tables that do not include temporal filters. Certainly for common cases like SHOW TABLES; but we would want some care in the logic to be certain.

frankmcsherry commented 1 year ago

Comments from discussion:

  1. @benesch observes that if you are using RTR you want freshness optimized. If we collectively want to lean in to that, fresh by default, and speedy if toggled perhaps tracks the strict serializable thinking applies here as well (guarantees first, teach about performance).
  2. @frankmcsherry observes that in the read optimized case users of the shallow views are penalized by the mere existence of the deeper views, which they do not use. This has a bad feeling to it, vs the freshness optimized case where only the users of the deep views are penalized for using the deep views (and only when used; not for their existence).
  3. A lot of these trade-offs might vanish with session timelines (https://github.com/MaterializeInc/materialize/issues/14052).