Snowflake-Labs / schemachange

A Database Change Management tool for Snowflake
Apache License 2.0
482 stars 219 forks source link

Best way of handling hotfixes? #117

Open henwev opened 2 years ago

henwev commented 2 years ago

First of all a big thank you to everyone involved in developing schemachange. It really is an awesome tool to work with!

In our environment we sometimes need to wait a bit for tests to be completed before we can introduce our changes from the the dev/test database to production. In the meantime there might occure the need of doing a hotfix in production and I wanted to ask about best practices of handling them.

I saw that the author of #21 had pretty much the same question, but the answer has a flaw in my opinion (unless I am missing something):

in my experience the best way to handle the situation you're describing is with version control and branching strategies. You would create/use a hotfix branch, add a change script with the next version number, deploy. Then merge that back to your dev branches and rename the scripts in development appropriately.

If I merged the hotfix back to dev, then rename the script in dev to a higher version number, it would re-apply the script in production once I merged the dev branch into the prod-branch some time later for the next release, wouldn't it?

One possible solution I could think of was to apply every hotfix as an repeatable R-script and never touch the script again afterwards. This would ensure that the hotfix is only applied once in every environment. The only drawback I see is, that it more or less abuses the idea behind R-scripts, because a hotfix-script might have non-repeatable content. This might confuse a developer but I think if hotfixes are stored in a dedicated "hotfix" folder it should be clear.

I also had a look at flyway and found in their FAQ about hotfixes that there an "outOfOrder"-option to also apply version numbers, that are lower than the currently highest deployed version number (source: https://flywaydb.org/documentation/learnmore/faq.html#hot-fixes). But from browsing other issues of schemachange I already noticed that the maintainers are not the biggest fans of the outOfOrder idea and I definitely get the arguments against it.

To conclude I have these specific questions:

Thank you!

UberDoodles commented 1 year ago

I haven't had to deal with hotfixes yet, so I'm speaking entirely theoretically.

I think the R-script approach could work. To protect against the possibility of the script being repeated, you might be able to make any hotfix code repeatable, but it might be a headache sometimes. You might have to create a temporary stored procedure as part of your hotfix script, which worked out whether the hotfix had already been applied.

A slightly different alternative I can think of would be to introduce a new script type to schemachange (maybe H-script for Hotfix). This would be essentially the same as an R script, except it only executes once (it wouldn't be re-run if the hash changed); so would protect against the hotfix being re-run.

UberDoodles commented 1 year ago

One thing that was still bugging me after I wrote that response was "but when would the H-scripts run?". You really need them to run in the correct order; scripts that are added later might depend on the hotfix having been applied. If the H-script ran at the same time as the R-scripts, that wouldn't be the case.

I came up with a potential solution for that this morning. I said in my original response that H-scripts would be a lot like R-scripts, but actually they should be more like V-scripts, and they should have a version number which indicates where they fit in. The difference between a H-script and a V-script would simply be that H-scripts are run retrospectively; a bit like having the outOfOrder option specifically for these scripts.

I think if used correctly, this might be a good solution. I just have two concerns with it.

First, any scripts which are already in your branches and have a higher version number than the H-script will not have been written with the hotfix in mind, and might not be compatible; you'd need to verify this.

Secondly, the H-scripts would be open to abuse/misuse. Whilst they are intended for applying hotfixes, there's nothing to stop someone using them to try and re-write history. E.g. "Oh, back when I put that change in, I should have also made this other change. I know! I'll put a H-script in!".

tsceurman-ara commented 1 year ago

I realize this is an older item, but thought I would weigh in, if only to see if anyone else has other ideas. We ran into issues with hot fixes and other versioning issues (i.e. code is in DEV or QA waiting for sign-off while other code is ready to go). Here's how we are currently handling this:

What the above process allows us to do is to not worry about deploying changes to an environment that aren't really ready/signed off. Even if a dev is working on something, we can create a hot fix in dev, test it, deploy to QA, and then to PROD, and we don't have to worry about renumbering scripts. It's not always pretty, but it works for us.

Having come from the SQL Server world, with DACPAC deployments, I struggle with having to do migration scripts. I've done it in the past, but once we got to that point it was waaaaay better. We have been exploring keeping everything but tables scripts in R scripts, so that we can see the current state of them at any time. It's usually not a big deal to drop and recreate a view, procedure, function, or even a task or stream (assuming you can handle data missed when the stream is dropped). That limits your overall exposure to the versioning stuff.

sfc-gh-tmathew commented 9 months ago

Hello @henwev, @UberDoodles , @tsceurman-ara

Thank you for the ideas. We are reviewing open issues currently, and would like to evaluate Issue #110 as a potential answer to controlling order of execution without loosing version number of a batch of scripts.

Please do evaluate #110 and let us know if it helps with your use cases.

mw243270 commented 7 months ago

Hi @tsceurman-ara I am dealing with a similar situation and your process sound like it would be doable. Wondering if I could get more detail on the 4 bullet points you provided on your setup? May an example structure? Do the script version numbers in SCHEMACHANGE.CHANGE_HISTORY end up being the same in DEV, QA and PROD? https://github.com/Snowflake-Labs/schemachange/issues/117#issuecomment-1404232759

Thank you

tsceurman-ara commented 6 months ago

@mw243270 I'll do my best. May be easiest to try to walk through a development cycle, and point out where we may do things... "weird".

To start, we use Azure DevOps for Change & Source Control, as well as handling the deployments. We have standardized our database creation process, so that all the database have the same basic schemas, with the same set of roles and permissions for each. We have automated processes to handle standing up a new repo in DevOps, including creating the release pipeline.

We have a repository for each database, named the same as the database is; the reason for this is to make it easier to set context, plus we always know where things are going. The folder structure in each repo is standardized:

Each Major and Minor version gets its own folder. The ONLY reason we did this was to make it easier to figure out what number we were on. Only schema-bound objects and integrations go in these Version folders, and we have decided on one object per file; that means a lot of files, but if someone misses a comma in a 30-column table CREATE statement, it's somewhat easier to track down where the issue is when you only have to look at ONE thing. The script files are all named with the Major, Minor, and "Order", and then the name of the object affected. For example:

That's the basics of how we have our repos set up.

For the actual workflow:

  1. Dev syncs up with the remote repo (we use VSCode for most dev work here). Devs only ever work in the dev branch of a repo.
  2. If there's a subfolder under WIP that matches what is being worked on, files are created there. If not, Dev creates a new folder under WIP. Note: We don't always have subfolders, it sort of depends on how many devs work in a repo/db at the same time.
  3. Dev has two choices to do the initial work:
    • Do all the work in Snowflake directly, then when ready to push copy/paste into files in VSCode.
    • Create the script files and edit as needed (one for each object) in VSCode, and use the Snowflake extension to execute the scripts.
    • At this stage, the Dev doesn't know what the version will be, so they create the scripts simply with the Script #/Ordering value and object name.
  4. Dev works with requestor to do as much testing/validation as possible in DEV environment.
    • This part is tricky, and probably not standard.
    • From a testing perspective, we are mostly concerned with making sure we get the data flowing through properly, and we have a general sense of what the data structures are.
    • What we DON'T worry about is if the data we receive has all the elements other people want; we have had stuff sit in DEV for over a year as we fight with data providers to get alignment on things.
  5. Once we verify everything is flowing properly and we have a good base for the data itself, Dev commits and pushes changes to remote repo, then tells the deployment team (me) changes are ready.
    • We do our best to link every commit to a Work Item in our Board with the Commit comments, and use the Work Item to tell people the state of changes (i.e. "Code is ready for promotion.").
  6. I create new Version folder(s), move the required script files to the new Minor Version folder, and update all the file names with the Major & Minor versions. Once done, I commit and push to the remote repo.
  7. Once the remote repo is synced up, I push to the qa branch and run our deployment pipeline for that branch (we use the YAML-based Pipelines in ADO), then update the appropriate Work Item with the results.
    • We generally make the Dev verify the deployment in our QA environment, if only because we have occasional issues with something not getting deployed properly, or skipped, or whatever.
  8. Once changes are ready for PROD, Work Item is updated and assigned back to me, at which time I merge to our main/prod branch and run the pipeline from that branch to deploy changes to our PROD environment.

So, to answer that last question, QA & PROD should always sync, but DEV probably won't. We debated internally as a team if that mattered and in the end decided that DEV is DEV, who cares. Many of the people on our team have had to go through multiple internal & external audits, especially SOX-related ones, so we have a good sense of what auditors might look for, and the DEV environment lining up with others isn't usually an issue. (Note that our Snowflake environments/data are NOT currently under audit controls, but we like to be prepared to go down that road, just in case).

I recognize that we don't use Git "properly"; it's because I came from a TFS background (and other similar tools), and most of the rest of the team hadn't really used source control before, so they got stuck with how I know/understand how to handle things. I'm sure the commit graphs are real ugly, but eh.

We also use schemachange to manage account-level objects, like Warehouses, Users, Roles and Grants, etc. It's a slightly different setup (only use R__ scripts, scripts get deployed to all environments in one release, and only select people use the repo). It's not the best way to do it, but it works for now.

That was a lot, sorry for the extended answer, but it's hard to skinny it down. Hopefully that gives enough information to better understand our workflow, and why we do what we do. It's not perfect, and I am trying to figure out how to reduce the overall touches we have to do with it, but even with that I think the structure would be fairly similar.