authzed / spicedb

Open Source, Google Zanzibar-inspired permissions database to enable fine-grained authorization for customer applications
https://authzed.com/docs
Apache License 2.0
4.94k stars 266 forks source link

Proposal: Enhanced migration framework for SpiceDB #688

Open josephschorr opened 2 years ago

josephschorr commented 2 years ago

Problem: The SpiceDB datastores occasionally require migrations of the schema and data for the underlying services or engines. For example, adding a new index into Postgres, or writing a default transaction into MySQL.

Currently, this is handled by the spicedb migrate command, which executes all migrations up until the specified revision.

However, this design has a number of issues:

  1. The IsReady() call on datastores (except MySQL) checks for only the current migration revision: if a revision has moved onward, then it will return false, and if there is a health check based on it (which we do not currently do), it will break
    1. In MySQL, it only checks one revision backwards
  2. The migrate command mixes both DDL and DML into a single set of migrations, which can be a problem for customers using SpiceDB with migration frameworks that don’t support a mix
  3. The migrate command encapsulates the DDL changes completely, preventing external migration frameworks from easily being used (they have to run the migration into an empty DB, then copy its schema)
  4. Each datastore implementation defines its own IsReady, which leads to some duplication
  5. We currently have no handling of the scenario where different code needs to run as part of a four phase migration

Proposal

Update the migration revision system to handle backwards and forwards compatibility, break out DDL and DML migrations and define a means of testing.

DDL vs DML

DDL Revision

const createUniqueIDTable = `CREATE TABLE metadata (
    unique_id VARCHAR PRIMARY KEY
);`

func init() {
  DatabaseRevisions.RegisterDDL(
    "6.8.0",
    []string{createUniqueIDTable},
    emptyNonTxMigration,
    Compatibility: {AtLeast: "6"}
  )
}

DML revision

func init() {
  DatabaseRevisions.RegisterDML[PostgresMigrationDriver](
    "7.0.0",
    addUniqueId,
    Compatibility: {Only: "6.8.0"}
  )
}

const insertUniqueID = `INSERT INTO metadata (unique_id) VALUES ($1);`

func addUniqueId(driver PostgresMigrationDriver) error {
   … execute update here…
}

Tracking Revision

Updating a DDL revision

Rename the column from its format ID to its current ID

ALTER TABLE current_revision RENAME COLUMN v6_7_0 TO v6_8_0

Updating a DML revision

Place the current revision into the dml_revision column

UPDATE current_revision SET dml_revision='7.0.0'

Checking the current revision

Select the single row from the table, which grabs both the DML revision and the column name containing the DDL revision

SELECT * from current_revision

{
  "dml_revision": "7.0.0",
  "v6_8_0": 1,
}

latest("6.8.0", "7.0.0") -> 7.0.0

Compatibility

Defining compatibility

func init() {
  DatabaseRevisions.RegisterDDL(
    "6.8.0",
    []string{createUniqueIDTable},
    emptyNonTxMigration,
    Compatibility: {AtLeast: "6"}
  )
}

Encoding compatibility

Compatibility will be encoded into a “migration revision and compatibility” string that will be stored for DDL as a column name and DML as the value in the primary key:

Compatibility Struct Description Encoded Column Name
{ AtLeast: “6" } Revision is compatible with any other revision of major version 6, which can continue to run against it 6.8.0-a-6 v6_8_0__a__6
{ AtLeast: “6.1" } Revision is compatible with any other revision of 6.1 or later for major revision 6 6.8.0-a-6.1 v6_8_0__a__6_1
{ Only: “6.8.0" } Revision is only compatible with version 6.8.0 and only 6.8.0 (or 7.0.0) can run against it 7.0.0-o-6.8.0 v7_0_0__o__6_8_0

Note Only revisions will also define a gate: a version of SpiceDB that must be run and deployed before the next revision can be run. In the above example, 6.8.0 would have to be deployed before 7.0.0 can be deployed, otherwise older instances would go unhealthy.

Checking compatibility

If SpiceDB sees a revision different from its latest, it will parse the encoded form and determine whether it can continue running against that revision

If it cannot, it will return IsReady false and go unhealthy with an informative error

Note If the compatibility string format is not known by the current code (such as if we add a new form of compatibility, it will be treated as a breaking change

Testing

Provide a means for defining test data for each step of a migration, to ensure it can be end-to-end tested in CI

const createUniqueIDTable = `CREATE TABLE metadata (
    unique_id VARCHAR PRIMARY KEY
);`

func init() {
  DatabaseRevisions.RegisterDDL(
    "000007-create-metadata-table",
    []string{createUniqueIDTable},
    …
    SampleDataForTesting{
      Statements: []string{"INSERT INTO …"},
    }
  )
}

CLI changes

jzelinskie commented 2 years ago

cc @palacerteupgrade @jhalleeupgrade

jhalleeupgrade commented 2 years ago

Great proposal! Thanks for taking the time to write this; it's very thorough.

Based on this, my understanding is that ddl & dml changes aren't separated as of now so it wouldn't be possible to run or output them separately. As a first step, do you think it would be acceptable to only add a CLI command/flag that allows to output the SQL? (ddl + dml) from a certain revision? (mvp) e.g: spicedb migrate --dry-run (code would already detect which revisions need to be applied)

Perhaps I'm missing something and it isn't possible to narrow this the way I'm proposing? Please let me know.

Side question: Any reason why you guys decided on implementing the migration process yourselves instead of using something like https://github.com/golang-migrate/migrate ?

josephschorr commented 2 years ago

As a first step, do you think it would be acceptable to only add a CLI command/flag that allows to output the SQL? (ddl + dml) from a certain revision? (mvp)

We don't have that distinction right now, which would mean changing some of the existing migrations; if we are going to do so, that's about half of the work for the proposal anyway, so it would likely be possible to start with that, but with the caveat that properly handling versions would require re-versioning the migrations yet again

Any reason why you guys decided on implementing the migration process yourselves instead of using something like https://github.com/golang-migrate/migrate ?

It appears to not support migrations that are purely code based, which is a requirement we have

jhalleeupgrade commented 2 years ago

As a first step, do you think it would be acceptable to only add a CLI command/flag that allows to output the SQL? (ddl + dml) from a certain revision? (mvp)

We don't have that distinction right now, which would mean changing some of the existing migrations; if we are going to do so, that's about half of the work for the proposal anyway, so it would likely be possible to start with that, but with the caveat that properly handling versions would require re-versioning the migrations yet again

Any reason why you guys decided on implementing the migration process yourselves instead of using something like https://github.com/golang-migrate/migrate ?

It appears to not support migrations that are purely code based, which is a requirement we have

That makes sense. Thanks for the detailed answer.

jhalleeupgrade commented 1 year ago

A few remaining questions:

It seems that it isn't planned to include DML changes. However, it seems that there are DML changes which are SQL statements. (e.g backfill_xid_add_indices migration). Is excluding this from the proposal on purpose?