cloudfoundry / diego-notes

Diego Notes
Apache License 2.0
23 stars 7 forks source link

Add a proposal document for doing locket schema migrations #34

Closed jvshahid closed 7 years ago

jvshahid commented 7 years ago

[#142933783]

cfdreddbot commented 7 years ago

Hey jvshahid!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

jfmyers9 commented 7 years ago

@anoop2811 and I added a proposal 3 which is very similar to John's proposal 2 looking back on it.

Proposal 2 definitely can work. I share @ematpl's concern around how to keep data access consistent when performing data migrations. I see this comment in tracker: "What I had in mind is to make the version checking participate in all transactions that read/write from the database. That way we can guarantee that database access conforms with the data version, i.e. all db operations see a consistent version from start to finish." Does a migration lock an entire table when performing a migration (aren't these not supported in galera?)? When exactly is the data version updated (after the migration, during the migration, before the migration)? I'm assuming after the migration makes the most sense, but then I don't understand how you prevent an older locket server from affecting already migrated data. Is it acceptable to just have all locket servers have a brief moment of unavailability when migrations are running?

Also, does a locket server have to maintain code branches for every version that you could possibly migrate to? If I have a locket server at v5, does it have v4, v3, v2, and v1 code branches that it can still operate in (until we cut a new major version)? This seems like an expensive maintenance cost.

crhino commented 7 years ago

I don't think a two step migration process is a great operator experience. Potentially the bosh post-deploy script could help there, but the BOSH team seems very against the usage of the feature itself.

Conceptually I like the idea of separating the updating of code and deciding what version to use, but in reality that feels like an overly complex way of doing rolling deploys. What is the problem with having the latest version of locket be responsible for any backwards-compatibility issues and not having to coordinate schema versions?

One thought experiment I am using is how to handle migrations that modify existing column data, e.g. changing JSON formatted data to YAML formatted data. In all the proposals, the migration would create a new column and reformat the data into that column. However, after the migration has finished but before the servers have been switched over to L2, the L1 code is still inserting and updating the JSON column. How do we ensure that we do not lose any data from the subsequent updates to existing rows?

emalm commented 7 years ago

Merged as a record of our thoughts for future implementations.