[Epic] persist: schema evolution

danhhz commented 1 year ago

Notion epic

Link: https://www.notion.so/persist-schema-evolution-fdb6696cbed04b1494d610611ca5538c Product brief: Status: In progress Prioritization: 08 - August Estimated delivery date: 2024-08-31

Feature request

In the storage hangout today, we discussed the general prompt of "where do schema changes live?" (e.g. above storage, between storage and persist, in persist, or some combo of those). Petros pointed out that, because of retractions and compaction, persist will need to know (at least) about add/delete nullable/default-able column. It's possible that other layers will need to understand schema changes as well.

For persist, I imagine that this roughly looks like:

Every persist shard gets a schema
The current schema can change over time
The mapping between something like SeqNo -> (Antichain<T>, Schema) is stored in state
Schemas are allowed to stepwise evolve according to a set of limitations (e.g. the avro/parquet rules or a subset of them)
Each persist batch (both user writes and compactions) is written and tagged with the current schema
Each persist batch, when read, has to migrate though all schemas between the one it was written with and the current one
- Possibly push the schema down into parquet, and then rely on parquet's schema evolution for this?
Once all batches have been migrated past a given schema, we're free to remove it from state
Open question: what happens to e.g. a subscription that started at one schema and encounters schema changes as it goes? Do we have to be able to reverse them too?

This isn't something we're planning on doing right now, but it has implications for #12684 and so I wanted to get ahead of it. In particular, the subset necessary for 12684 is probably that each shard gets a schema at startup, and then we (for now) don't allow it to ever change.

### Bugs
- [ ] https://github.com/MaterializeInc/incidents-and-escalations/issues/117

petrosagg commented 1 year ago

Open question: what happens to e.g. a subscription that started at one schema and encounters schema changes as it goes? Do we have to be able to reverse them too?

We briefly touched on this in a call with @jkosh44 and I think the answer is no if we don't care about latency and yes if we do. A naive way of doing it is dropping all dataflows that depend on the to-be-altered object, altering the schema, and then rescheduling the dataflow with the additional knowledge of how to handle the new schema.

I think a much nicer way to handle it is to rely on the fact that we already have a way to describe computation in a way that is serializable. If each schema evolution was associated with two MFPs (which already have proto definitions) one for forward migrating a row and one for backward migrating a row then we could set up the persist_source to listen to a metadata stream containing these defnitions and creating migration functions for the various versions on the fly, by evaluating the MFPs.

sploiselle commented 1 year ago

Just wanted to double-check because I can't seem to find anything definitive in the parquet docs: does parquet support forward compatibility akin to protobuf? i.e. if I am on an old schema and am unaware of the existence of some new field, am I allowed to drop it on the floor?

If that's how parquet works, do we plan to use that behavior when we tackle this issue?

Asking because this would let us move closer to supporting upstream schema changes in PG––we could always store all of the information we're aware of and rely on the reader's schema definition to filter out unknown fields, i.e. maintain the schema for all persist sources.

danhhz commented 1 year ago

I don't recall offhand if that's something parquet has built in (and if our library, parquet2, supports it) but it's certainly something we could within persist on top of parquet. And yeah, that's how it would work.

Do note that persist isn't planning to do anything here for a while. It's not possible to support every possible schema evolution in a backward-compatible way, basically just "add/remove column that is nullable/has default", so AFAIK Nikhil has proposed that we build the first version of it entirely above persist by rewriting data (migrating to a new shard) and then persist can later do work to make the backward-compatible cases a performance optimization.

MaterializeInc / materialize

[Epic] persist: schema evolution #16625

Notion epic

Feature request