estuary / flow

🌊 Continuously synchronize the systems where your data lives, to the systems where you _want_ it to live, with Estuary Flow. 🌊
https://estuary.dev
Other
637 stars 56 forks source link

control: Design Catchall #344

Closed saterus closed 2 years ago

saterus commented 2 years ago

This is a big system we're working on and there are a lot specific bits we need to work out yet. We've been having regular design discussions to hash out the details, but there's quite a breadth of topics. Some of them will get their own dedicated Issue (like #341) if they warrant a lot of extended discussion.

I'm going to post the notes from our discussions here as a way to keep everyone else in the loop.

saterus commented 2 years ago

2022-01-13:

Indexing Entities in Postgres: As a general rule, we still want to rely on the Build SQLite databases for detail Entity Specs. In order to supply the UI with cross-cutting info, we'll need to maintain a few secondary indexes in Postgres.

Specifically:

Resource Resolution: Discussed a desire for UI & CLI file structure compatibility. This allows Catalogs created in the UI to be usefully represented in a CLI workflow. We'd like to avoid a user making a transition to a GitOps workflow sift through a 2000 line yaml file.

We need to resolve references on the client side (UI or CLI). The Catalog payloads will include related Resources as bundled ResourceDefs. This allows the UI to handle them separately and the CLI to manage them as individual files. This should avoid the user needing to sift through unnecessary duplication.

saterus commented 2 years ago

2022-01-14:

Brainstormed Questions that the Control Plane will need to be able to answer:

Build/Test/Activate:

Users/Permissions:

Events:

saterus commented 2 years ago

2022-01-18:

General Principal:

Creation/Editing Workflows:

Changesets:

Builds:

Diffs:

Schema Followup:

Status/Metrics:


We're going to pause on the everyday discussions for a minute. We've got 3 things we want to work on in the meantime:

  1. Schema modeling. Johnny and Alex are braindumping in preparation for comparing notes.
  2. Idealized Mocks. Dave is working on mocks for the UI we hope to be working towards as we build out the full platform. This is our guiding light for where we want to be.
  3. Mocks for Q1. Travis and Alex are going to work through mocks of what we're actually shooting for this quarter. This will let us discuss concretely what we're UI building immediately.

We'll reconvene as these things progress.

saterus commented 2 years ago

From Slack: https://estuary-dev.slack.com/archives/C01G7CFNA8K/p1642544683030000

What does it mean to "delete" an Entity? How do I include that in a Build to be applied? I know I can edit a ShardSpec with delete: true, but what does it mean at the Flow-abstraction level?

Deletion is a separate "deactivation" rpc, the inverse of an activation.

If you deactivate a collection we'll remove its journals. If you deactivate a catalog task we'll remove its shards and their recovery logs, etc.

It's dangerous enough to be special, not represented as part of a Catalog or a Build.

To recap, I'm now thinking of deletions as an explicit DELETE /entity/:id endpoint, which from my end is very simple. We can add as many safeguards and checks as we like, but it isn't part of the build/activate workflow.

saterus commented 2 years ago

2022-01-24: Schema Discussion

Primary Keys:

Prefixes & Validation:

Status:

Extended details can be found on #341.

saterus commented 2 years ago

2022-01-25:

Activation => Deployment

Which Database to Build into?

Resolving Entity References:

Complexity of Partial ~Activations~ Deployments

Deploy an atomic Build

Strong and Weak Imports

Access Enforcement

jgraettinger commented 2 years ago

Thank you @saterus , great write-up of the conversation!

Minor comment on GET /entity/:id/build/:build_id: I'd pictured this as GET /build/:build_id/entity/:id/ -- that there would be various APIs which have the common context by a specific build ID, and extract information from it.

saterus commented 2 years ago

2022-02-10: Access Control

Questions we want to be able to answer:

Terminology:

Provisioning Accounts:

Unsettled Questions:

Diagrams

Exported Excalidraw File

access-control-v3

saterus commented 2 years ago

2022-02-15: Control Plane Design Discussion

Johnny, Dave, Phil, and I got together to chat about what separate streams of work we could start on for the control plane. It immediately went farther afield and started talking through the ramifications of grouping entities in the UI (akin to files with the CLI).

Builds

API Discover Output

encrypted-config

Catalog Entity Export

Builds Root Service

Sops Service

Foreign Entity Resolution

saterus commented 2 years ago

Expanded Discovery Response - Followup

As a bit of a followup to our previous discussion, I wanted to document the current behavior of flowctl discover so we're all on the same page. We've been talking about the mechanisms to group things into files, but unless you've run flowctl discover lately, it may seem a bit abstract.

This is going to be relevant as we're wanting to have the Control Plane's discovery endpoint return more than just the connector's raw output (which is what I'm returning today). We may or may not want to try and target identical output to what flowctl discover does today, but it's at least a good starting point for the discussion.

Let's run through an example using source-postgres on the Control Plane database.

Example

I ran flowctl discover --image=ghcr.io/estuary/source-postgres:fb353df --prefix planetExpress with a recent version of flowctl.

This created a config file template inside a new planetExpress directory with the name source-postgres.flow.yaml. I edited this file to point to my local control-plane-database (a bit recursive, but it's what I have handy).

Then I re-ran flowctl discover --image=ghcr.io/estuary/source-postgres:fb353df --prefix planetExpress. This performs the discovery of the tables and outputs the following files:

$ tree planetExpress/
planetExpress
├── _sqlx_migrations.schema.yaml
├── connector_images.schema.yaml
├── connectors.schema.yaml
├── source-postgres.config.yaml
└── source-postgres.flow.yaml

Breaking these down a bit:

It has a single top-level file for the Capture itself, source-postgres.flow.yaml.

collections:
  planetExpress/_sqlx_migrations:
    schema: _sqlx_migrations.schema.yaml
    key: [/version]
  planetExpress/connector_images:
    schema: connector_images.schema.yaml
    key: [/id]
  planetExpress/connectors:
    schema: connectors.schema.yaml
    key: [/id]
captures:
  planetExpress/source-postgres:
    endpoint:
      connector:
        image: ghcr.io/estuary/source-postgres:fb353df
        config: source-postgres.config.yaml
    bindings:
      - resource:
          namespace: public
          stream: _sqlx_migrations
          syncMode: incremental
        target: planetExpress/_sqlx_migrations
      - resource:
          namespace: public
          stream: connector_images
          syncMode: incremental
        target: planetExpress/connector_images
      - resource:
          namespace: public
          stream: connectors
          syncMode: incremental
        target: planetExpress/connectors

Exploring from the top, we have a Collection entry for each table it discovered. It uses the (prefix + table name) to generate a name for the Collection. It infers the Collection key from the table's primary key.

It generates an accompanying schema file for each Collection/discovered-table:

_sqlx_migrations.schema.yaml
properties:
  checksum:
    contentEncoding: base64
    type: string
  description:
    type: string
  execution_time:
    type: integer
  installed_on:
    format: date-time
    type: string
  success:
    type: boolean
  version:
    type: integer
required:
  - version
type: object
connector_images.schema.yaml
properties:
  connector_id:
    type: integer
  created_at:
    format: date-time
    type: string
  digest:
    type: string
  id:
    type: integer
  name:
    type: string
  tag:
    type: string
  updated_at:
    format: date-time
    type: string
required:
  - id
type: object
connectors.schema.yaml
properties:
  created_at:
    format: date-time
    type: string
  description:
    type: string
  id:
    type: integer
  maintainer:
    type: string
  name:
    type: string
  type:
    type: string
  updated_at:
    format: date-time
    type: string
required:
  - id
type: object

Next, we have the Capture definition itself. It references the config file, but this config is not yet sops encrypted (referencing it after encryption works the same way though).

source-postgres.config.yaml
connectionURI: postgres://flow:flow@localhost:5432/control_development
# Connection parameters, as a libpq-compatible connection string
# [string] (required)

max_lifespan_seconds: 0
# When nonzero, imposes a maximum runtime after which to unconditionally shut down
# [number]

poll_timeout_seconds: 10
# When tail=false, controls how long to sit idle before shutting down
# [number]

publication_name: flow_publication
# The name of the PostgreSQL publication to replicate from
# [string]

slot_name: flow_slot
# The name of the PostgreSQL replication slot to replicate from
# [string]

watermarks_table: public.flow_watermarks
# The name of the table used for watermark writes during backfills
# [string]

One point I'd note from all of this, in contrast to our previous discussion, is that there just isn't that much "grouping" going on. Mostly these entities are placed in their own files and referenced where necessary.

Control Plane Discovery

We still want the Control Plane to call flowctl api discover, which currently only returns the raw "bindings" output from the connector. The wrapper flowctl discover is what takes these bindings, along with the config and Capture metadata, to craft these files.

I think we want to craft response payloads that could be used by the UI or the CLI:

In either case, it seems like we can model the response payload off the current output of flowctl discover. We'll want to change it some due to the format (we can't use comments on fields), but I don't foresee any major changes. I think this transformation from "bindings" to specs/schemas is probably best for the Control Plane to handle, to avoid needing to write this on both the CLI and UI side. Our goal would be to return top level "file-spec/schemas" that could be easily turned into a Catalog spec for use with the upcoming build endpoint.

saterus commented 2 years ago

Control Plane / UX Meeting:

We talked a lot about the desired workflow for registration and login. We want to avoid doing extra work for the local login, as we know we want to use OpenID Connect in production. Ideally, the only difference will be in the this workflow and all subsequent requests will act identically.

We ended up working up some sequence diagrams to describe these flows.

OpenID Registration

openid-registration-workflow

OpenID Login

openid-login-workflow

Local Login

As you see, the local login flow is just an abbreviated IDP login. There is no password for local logins, it is completely insecure. The user simply provides a account name they wish to login as. This gets passed as the auth_token and we use this to find_or_create an Account and Credential for them.

local-login-workflow

Sequence Diagram Source

In case it's useful for others, I've included the source for generating the diagrams.

Sequence Diagram Source ###### OpenID Registration Workflow ```sequence participant IDP participant User participant UI participant API participant DB // Authenticate with IDP User->UI: Load Page UI->User: Redirect to /login User->UI: Click Login w/ Provider UI->User: Redirect to /provider/login User->IDP: Provides username/password/2fa IDP->UI: Redirect to callback uri // Login Fail (no account) UI->API: POST /sessions/:issuer w/ auth_code API->IDP: POST /token IDP->API: id_token { issuer, subject, ... } API->DB: Find Credential by (issuer, subject) DB->API: None API->UI: Redirect to /registration UI->User: Registration Form User->UI: Fill in Account's catalog name, TOS, etc. // Register UI->API: POST /registration (w/ auth_code) API->IDP: POST /token IDP->API: Full jwt API->IDP: GET /userinfo IDP->API: { email, name, ...,} API->DB: Create Account DB->API: Account { email, name, ... } API->DB: Create Credential DB->API: Credential { account_id, issuer, subject, ... } // Login API->UI: Signed Session Token UI->UI: Add Signed Session Token to local storage UI->User: Logged In! ``` ###### OpenID Login Workflow ```sequence participant IDP participant User participant UI participant API participant DB // Authenticate with IDP User->UI: Load Page UI->User: Redirect to /login User->UI: Click Login w/ Provider UI->User: Redirect to /provider/login User->IDP: Provides username/password/2fa IDP->UI: Redirect to callback uri // Login Success UI->API: POST /sessions/:issuer w/ auth_code API->IDP: POST /token IDP->API: id_token { issuer, subject, ... } API->DB: Find Credential by (issuer, subject) DB->API: Credential { account_id, ... } API->DB: Find Account by id DB->API: Account { ... } // Login API->UI: Signed Session Token UI->UI: Add Signed Session Token to local storage UI->User: Logged In! ``` ###### Local Workflow ```sequence participant IDP participant User participant UI participant API participant DB // Authenticate with IDP User->UI: Load Page UI->User: Redirect to /login User->UI: Click Login w/ Local Provider UI->User: Redirect to /provider/login User->IDP: Provides username IDP->UI: Redirect to callback uri // Login Success UI->API: POST /sessions/:issuer w/ auth_code API->DB: Find Account by name DB->API: Account { ... } API->DB: Find Credential by account_id DB->API: Credential { account_id, ... } // Login API->UI: Signed Session Token UI->UI: Add Signed Session Token to local storage UI->User: Logged In! ```