estuary / flow

🌊 Continuously synchronize the systems where your data lives, to the systems where you _want_ it to live, with Estuary Flow. 🌊
https://estuary.dev
Other
530 stars 45 forks source link

Phil/cpv2 #1489

Closed psFried closed 1 week ago

psFried commented 2 weeks ago

Description:

Rolls up whole bunch of work from both @jgraettinger and myself related to:

I recommend reviewing this commit by commit, and tried to write helpful commit messages. Here's a high-level summary:

Reframe validation to be an operation over a tables::LiveCatalog (a snapshot from the control plane) and a tables::DraftCatalog, which is able to represent deletions.

Updates the publications handler to:

Introduces controllers, which are background automations that are tied to specific live specs. Controllers are responsible for:

Workflow steps:

The control plane should work more or less as it has in terms of user-facing operations, with a few exceptions:

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

I'd appreciate any time you have to test this yourself locally, and report any issues. I'd like to continue testing, myself, particularly on the transition from the old version, and I expect I'll need to push some updates before this is ready to deploy. I'd like to keep this PR open until we feel this is in a deployable state.

Please reach out by any means with any questions.

Deployment plan

The basic steps for deploying this are the same as any other control plane PR involving a database migration, except that there's an additional migration to be run after the deployment finishes. This second migration will actually need to be run multiple times, because it will incrementally enable controllers for pre-existing live specs, in batches of 1000 at a time. This gives us a chance to bail out if things aren't working right, instead of just unleashing controllers on all of our precious live specs. The end-to-end high level plan is thus:


This change is Reviewable

jgraettinger commented 2 weeks ago

This means that you can technically now overwrite the published inferred schema with flowctl. This might be easy to do, since it's hard to know if the inferred schema has changed, and your local spec is now stale.

flowctl now adds expectPubId to any live specs it pulls down from the control plane. expectPubId of zero means "must not exist".

This is to prevent a user from inadvertently overwritting an updated live spec by publishing a stale spec from their checkout. They'll get an error because expectPubId will disagree.

They'd have to explicitly remove the expectPubId field to denote that they want their local version to take precedence.

flowctl tries not to publish a spec which isn't actually changing, so they should be good to leave unmodified specs as-in in a local checkout while publishing a managed spec over-and-over.