etcd migrations system - Githubissues

erikh commented 8 years ago

Research if there's something like this first, but to appropriately convert databases between versions we should ensure that we have a system like this in place. It will become more critical as more people start using the product.

dseevr commented 8 years ago

Just to make sure we're on the same page, as I understand it, the point of this task is so that we have something similar to Rails' database migration system.

We'll need some things like the following:

A specific etcd key or directory will hold all the database migrations which have been run (the latest entry is the current version we're on)
The ability to define schema versions. These will be a list of all the changes since the last the last version. (add directory, remove directory/key, rename directory/key, create key with some contents, etc.)
volcli commands to run all pending migrations, run a single migration, dump the current schema, load a target schema, etc.

We'll also need to tie this into our test suite since it blows away our keyspace and relies on the etcd client constructor to repopulate it in certain parts.

erikh commented 8 years ago

yes, but we don't need a continuous migration system; we just need one that's per-release. So your migrations will be application version-based (make the unversioned dev build a version too). We should provide a separate utility for upgrading the cluster so it's clear from a UX perspective that this should never be done more than once per upgrade. volcli should not be involved.

dseevr commented 8 years ago

Got it.

It seems like it would be sufficient to have each release schema just be a list of things which must be ensured, e.g.,

ensure some directory exists
ensure some file exists with some contents
ensure some other file is deleted

rather than having a series of diffs going back to the very first schema. What do you think?

erikh commented 8 years ago

Hmm. I think what should happen here is similar to rails migrations (step through a version, execute its code, and move on to the next one if it succeeds). Outside of that feel free to use your own discretion.

I don't know if rollback support is really necessary -- we aren't talking about terabyte-sized keyspaces, restoring from backup would be fine probably for now. We'll worry about live stuff later.

On Thu, Jul 28, 2016 at 6:26 PM, Bill Robinson notifications@github.com wrote:

Got it.

It seems like it would be sufficient to have each release schema just be a list of things which must be ensured, e.g.,

ensure some directory exists

ensure some file exists with some contents

ensure some other file is deleted

rather than having a series of diffs going back to the very first schema. What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/contiv/volplugin/issues/353#issuecomment-236071416, or mute the thread https://github.com/notifications/unsubscribe-auth/AABJ6wCnXZJ9lmdG4cb6E5JHnqDn2qjsks5qaVbHgaJpZM4JN3Fn .

dseevr commented 8 years ago

Cool, I think I have enough to get started now. Thanks!

dseevr commented 8 years ago

I don't see any existing software which does anything like this (as expected), so I think we're on our own.

Are you thinking that each migration is a compiled binary which is executed, or should we be parsing some kind of migration file and executing the alterations line by line? The advantages of running code are that you can use loops and other constructs and it would require less code overall. If we used something like YAML or JSON, we'd need a way to ensure that commands are run in order so they'd likely end up just being being glorified arrays of commands to execute.

Once we have a more generic interface to the store (etcd2/3, consul, etc.), we can have the migrations run in transactions if the store supports them.

mapuri commented 8 years ago

found this thread interesting so thought will chime in with my 2cents :)

may be we should also use a more widely-used data encoding scheme than just plain JSON/YAML, like protobufs? We can get benefit of existing practices to handle data model updates. Also chances of us hitting update scenarios that someone else has already hit (and solved) would improve too.

erikh commented 8 years ago

@mapuri I appreciate the idea of saving code, but...

etcd uses strings so byte payloads are going to be interesting with protobuf/capnproto/grpc; at least with json we can be guaranteed utf-8 so strings are not a problem.
The rails migrations are a pretty well-established pattern from the web world.

erikh commented 8 years ago

@dseevr this is what I was thinking:

[ ] binary for migrating, takes an arg to select which version to migrate to (default is 'latest').
[ ] Confirm before performing.
[ ] a --list argument which will list all the available versions to migrate.

As for parsing yaml etc I don't really want the extra overhead.

dseevr commented 8 years ago

@mapuri Protobufs are more for the wire format of a protocol than defining schema changes. (side note: the author of Protobufs says you shouldn't even use them anymore! He wrote Cap'n Proto as a replacement: https://capnproto.org/ )

Erik had something more like this in mind: http://edgeguides.rubyonrails.org/active_record_migrations.html

Basically, you use a high-level DSL to define DDL operations on your schema, and they get turned into the proper DDL queries for your database. A table tracks which migrations have been applied (each has a unique number based on the time of creation) and you can run a command to apply any pending migrations to your local (or production) db.

In our case, there's only one target (etcd) and thanks to Go interfaces we easily handle supporting different stores in the future.

erikh commented 8 years ago

Protobufs are still the standard; sorry to be that guy in an issue

dseevr commented 8 years ago

@erikh So you're thinking we just hard-code all the migrations into this one binary? That'd make things easy.

dseevr commented 8 years ago

@erikh LOL, yup I am aware 👯

erikh commented 8 years ago

@dseevr re: binary, yes, but please find an appropriate abstraction to make writing these easier.

dseevr commented 8 years ago

@erikh Great, exactly what I was planning to do

mapuri commented 8 years ago

etcd uses strings so byte payloads are going to be interesting

I think etcd clients can read/write byte arrays but I might be wrong. I remember as for netplugin we were encoding our data into a binary format very early but then reverted to json strings to make them readbale and easy to debug :-)

The rails migrations are a pretty well-established pattern from the web world

got it. Yes, my argument was mostly around using something well supported and standard. Thanks for clarifying Erik.

erikh commented 8 years ago

@mapuri https://github.com/coreos/etcd/blob/master/client/keys.go#L103

Consul uses []byte, which is why you might be confused.

mapuri commented 8 years ago

yes, seems like :) thanks for the pointer Erik !

contiv-experimental / volplugin

etcd migrations system #353