Improve experience deploying databases created by Migrations

ajcvickers commented 4 years ago

This is a grouping of related issues. Feel free to vote (👍) for this issue to indicate that this is an area that you think we should spend time on, but consider also voting for individual issues for things you consider especially important.

Currently, many developers migrate their databases at application startup time. This is easy but is not recommended because:

Multiple threads/processes/servers may attempt to migrate the database concurrently
Applications may try to access inconsistent state while this is happening
Usually the database permissions to modify the schema should not be granted for application execution
Its hard to revert back to a clean state if something goes wrong

We want to deliver a better experience here that allows an easy way to migrate the database at deployment time. This should:

Work on Linux, Mac, and Windows
Be a good experience on the command line
Support scenarios with containers
Work with commonly used real-world deployment tools/flows
Integrate into at least Visual Studio

The result is likely to be many small improvements in EF Core (for example, better Migrations on SQLite), together with guidance and longer-term collaborations with other teams to improve end-to-end experiences that go beyond just EF.

Done in 6.0

[x] #19693

Planned for 9.0

[x] #22616
[x] #17578
[ ] #17568
[x] #33731
[x] #33732

Backlog

[ ] #2167
[ ] #3053
[ ] #8695
[ ] #14531
[ ] #15174
[ ] #15762
[ ] #16406
[ ] #18624
[ ] #18840
[ ] #20402
[ ] #22613
[ ] #24866
[ ] #25872
[ ] #31999
[ ] #26155
[ ] #19694
[ ] #24710

roji commented 3 years ago

@roji - The approach we took was that zero-downtime deployments were fine as long as the expand-contract pattern was used, whereas any other migrations would require instances to be brought down. Since, in practice, most migrations tend to fall within the former category, I still think there's a strong argument for doing the changes at application startup.

That's an interesting point - it's indeed a good idea to manage migrations in a way which doesn't require downtime. But then I'm still unclear on the concrete advantages of applying at startup as opposed to at deployment... At the very least, the latter saves you from having to deal with the distributed locking issue (more discussion is available in our docs).

zejji commented 3 years ago

@roji - Running the migrations at application startup wasn't actually our first choice and our original intention was to have one or more deployment scripts which handled all the infrastructure initialization tasks.

However, because our application was distributed to clients for deployment (on any cloud), this initialization logic couldn't be kept in, e.g. Azure DevOps pipelines running for each service (although we did use these internally), and was instead initially encapsulated in a single Ansible deployment script. Another option might have been to use Kubernetes init containers. In either case, we found that the complexity of the infrastructure-as-code was spiraling out of control. This made it difficult to understand and maintain for our (small) team.

Having a simple library (as described above) which was used in each service made things much easier. Since all initialization logic was now encapsulated in each service, our dev docker-compose file could be as simple as just a list of the services themselves (+ message bus and monitoring tools). In other words, this now represented the entirety of the knowledge required to deploy the application. Suddenly, the IaC became an order of magnitude simpler and we were able to do away with the Ansible script entirely.

This may be less relevant for a monolithic application but, in any case, there is certainly some appeal for me in having an application able to safely initialize its own infrastructure in a few lines of code.

roji commented 3 years ago

Thanks for the additional details @zejji.

joaopgrassi commented 3 years ago

Just to say something, not sure it's going to help anybody, definitely not Jon and the idea of a library but: We started rolling the migrations at startup as well in our k8s cluster. We were "lucky" that we have in our infrastructure always one single pod that did some background work, so we just made that pod run the migrations during startup on each new deployment.

But, we started to have cases where we didn't have this "single" pod anymore and we were at this point where couldn't run the migrations as part of the service's startup since we have many instances of it in k8s.

The approach I used and it's working so far for us is use k8s jobs + helm pre-install hooks. Basically the idea is that a service declares a pre-install Helm hook and via a "simple" console app container image, we run the migrations and all happens before Helm upgrades the charts.

Like @roji said, this also has downsides as currently running pods will use a new version of the db for a fraction of time, but in our team so far we are being diligent and managed to not introduce any backwards incompatible changes, so all works.

Another point which we didn't do yet is to handle cases where the migration fails.. but AFAIK we could also leverage hooks for that, and rollback it using maybe post-rollback hooks.

dazinator commented 3 years ago

Even if you have a distributed locking mechanism, you must still likely ensure that application instances are down or waiting for the migration to be applied, otherwise you get different versions of your application trying to access a different database schema...

@roji - The approach we took was that zero-downtime deployments were fine as long as the expand-contract pattern was used, whereas any other migrations would require instances to be brought down. Since, in practice, most migrations tend to fall within the former category, I still think there's a strong argument for doing the changes at application startup.

I've been using the expand contract principle for years but never knew that was it's formal term! Thank you for this - now I can sound more educated when talking to my fellow developers :-) On a related note, I wonder if there is anything EF can do to highlight:

Potentially breaking migrations. For example - adding a new non nullable column with no default value is potentially breaking when there are existing records.
Aid with expand contract pattern. For example, removing a column is a contraction that may break existing instances and is therefore more "unsafe" than an expansion of the model. Imagine if by default "contractions" could be prevented by EF migrations unless you pass in a special "-allow-contraction" argument when generating the model. This would make it safer to use the expand contract pattern without accidental premature contractions. P.s "accidental premature contractions" is not a phrase I'd ever imagine writing in relation to software dev :-)

roji commented 3 years ago

I've been using the expand contract principle for years but never knew that was it's formal term! Thank you for this - now I can sound more educated when talking to my fellow developers :-)

Oh I just made up the term zero-downtime migrations, definitely nothing formal about it :rofl:

adding a new non nullable column with no default value is potentially breaking when there are existing records.

When you do this with EF Core, it will create a non-nullable column with the the default value for existing columns (e.g. 0 for ints - but you can go change the default in the scaffolded migration if you want). So this isn't a breaking migration.

For example, removing a column is a contraction that may break existing instances and is therefore more "unsafe" than an expansion of the model

EF does already issue a warning about potentially destructive migrations (e.g. removing a column). That sounds like what you're describing here?

dazinator commented 3 years ago

Oh I just made up the term zero-downtime migrations, definitely nothing formal about it

I was referring to the term "expand-contract pattern" mentioned by @zejji :-) it's nice to know this formal pattern name at last!

EF does already issue a warning about potentially destructive migrations (e.g. removing a column). That sounds like what you're describing here?

Yes but I think to support the expand-contraxt pattern more formally, you'd want ef core migration generation to error for any contraction of the model. This could be removal of a column but it could also be a reduction in column size or the change to a datatype of a column etc - basically anything that isn't purely additive (an expansion of the model) can be viewed as dangerous and so it would be great for safety to reasons to have ef migration generation "fail by default" in such a case as opposed to just generating warnings which can easily be missed. Then for times where you don't want this safety net- i.e because you want to do a legitimate "contraction" of the model, you could perhaps generate the EF migration with an optional command line parameter that turns this safety net off and allows the migration to be created. Just an idea - but off topic so I apologise:-)

roji commented 3 years ago

@dazinator it sounds like you're asking for a way to turn the destructive migration warning into an error?

dazinator commented 3 years ago

@roji yes if we could make dotnet ef migration add error by default for destructive migrations that would be safer for us. With a way to override when needed, for example suppose we pass in an additional command line arg, something like --allow-destructive. Finally if the destructive migrations themselves were generated with an informational attribute added to the top of the migration class, they could be more easily seen and discussed during code / peer review phase

[Destructive()]
public partial class SomeMigration
{

}

A Destructive attribute (or similar) on the migration could also allow for deployment time checks. If you can detect that the migrations you are about about apply to a database are marked as destructive then you might want to do things differently than when there are only non destructive migrations to apply. For example, when there are destructive changes you might want to:

Take all application instances down whilst you upgrade the database and then all the application instances.
Generate the migration sql scripts at the start of the upgrade to capture against the deployment for audit or approval purposes.

If there are no destructive migrations however then you might do a zero downtime deployment and care less about the above.

dotnet ef database upgrade -connection [conn string] --has-destructive
True

As a final safety net, dotnet ef database upgrade could fail by default if migrations to be applied were destructive unless --allow-destructive was specified. This way someone can't apply destructive changes to the database without being explicit in this intent.

dotnet ef database upgrade -connection [conn string] --allow-destructive

JonPSmith commented 2 years ago

Hi @zejji and @ajcvickers,

I have finally created a library that uses @zejji DistributedLock approach. You can see the code at RunStartupMethodsSequentially and I have also written an article called "How to safely apply an EF Core migrate on ASP.NET Core startup" which has a lot of info in it.

@zejji, you will see I overcame the "database not created yet" by having two parts to the locking:

Check if the resource exists. If doesn’t exist, then try another lock approach
1. If the resource exists, then lock that resource and run the startup services

This means I can try to lock the database, but if its not there i have a second go using a FileStore Directory. I have tried it on Azure and it works (see the article). @zejji, I really appreciate you describing this approach.

@ajcvickers and @roji: I know you don't like the "migrate on startup" approach but I have made it very clear that you need to make sure your migration does not contain a breaking change migration.

zejji commented 2 years ago

@JonPSmith - Thanks for the heads-up regarding your library implementation. Looks good! 😃

GeorgeTTD commented 2 years ago

[Destructive()]
public partial class SomeMigration
{

}
If there are no destructive migrations however then you might do a zero downtime deployment and care less about the above.
dotnet ef database upgrade -connection [conn string] --has-destructive
True

@dazinator @roji I would like to add another angle to this. We are currently looking at custom implementations to do a zero downtime deploy for apps with EF. Our prefered deploy strategy would be:

Spin up new deployment 2
Apply non destructive migrations
Switch traffic to route to deployment 2
remove deployment 1
Apply destructive migrations

It would be nice if you could split Up and Down migrations into destructive and non-destructive. You could do this several ways so I will leave the implementation up to the powers that be. However here is an idea which would work for our scenarios.

dotnet ef migrations add MyMigration --split-destructive

Outputs
# <timestamp>_MyMigration.cs
# <timestamp>_destructive_MyMigration.cs

dotnet ef database upgrde ...

# Only applies <timestamp>_MyMigration.cs

and then once deploy is done and deployment 1 is remove

dotnet ef database upgrade ... --allow-destructive

# applies <timestamp>_destructive_MyMigration.cs

Bonus points would be great if we could apply the steps in reverse for rollbacks.

roji commented 2 years ago

@GeorgeTTD how would the proposed --split-destructive switch work if there are more than one migrations pending, each with its own non-destructive and destructive components?

More generally, you are of course free to structure your migrations in this way, and that's indeed what zero-downtime migrations usually entail; but this isn't really something EF Core can automate for you without your own planning and inspection of migrations.

dazinator commented 2 years ago

@GeorgeTTD @roji If EF did mark destructive migrations with an attribute e.g

[Destructive()]
public partial class SomeMigration
{

}

Then @GeorgeTTD you could atleast get part way there with the following:-

Use two seperate migrations assemblies, for destructive vs non destructive.
Add a test to assert this convention - uses reflection to discover migrations with or without this attribute and makes sure they haven't accidentally been added to the wrong project.
You can now run each migrations assembly independently as required..

However I'm not sure this makes sense as could you get scenarios where a non destructive migration depends on a destructive one if that was the order they were generated. For example a migration to do a column rename (destructive) could be generated before a migration to add an index on that column (non destructive).. For this reason it might be the best you can do is just to know whether the release does or doesn't contain non destructive migrations and optimise your deployment accordingly - they still need to be applied in the same collective order they were generated in.

dazinator commented 2 years ago

Just to be clear, my proposition with the [Destructive] stuff was for EF to support a workflow that prevents teams from accidentally including Destructive migrations in a release. The value is that by deterring this it allows deployment flows where services need not be brought down whilst the database is being upgraded. It allows support of workflows where releases containing destructive migrations can be done, but behind a safety net that lends itself to better team communication and planning around it, as it can no longer be done "accidentally".

roji commented 2 years ago

@dazinator as you write above, migrations are a linear, ordered list, with dependencies between them. It isn't possible to execute only non-destructive migrations, deferring destructive ones that are interspersed between them (because of the dependencies). So splitting migrations into separate assemblies (or otherwise cherry-pick/separating them) doesn't seem possible.

However, in your second comment you seem to be proposing a cmdline switch where the EF Core tooling would simply refuse to apply migrations (e.g. when running a migrations bundle or dotnet ef database update) if any of them contain (or come after) a destructive migration. That does seem possible, and could possibly help prevent accidental errors, though note that any migrating would be blocked once a destructive migration is pending; I'm not sure exactly how that helps the workflow.

At the end of the day, zero-downtime migrations are something which require careful planning and a specific workflow; EF can certainly try to help, but nothing will obviate a manual checking and migration planning here.

Regardless, if you want to continue discussing this, please open a new issue - let's try to keep this general issue clear of detailed specific discussions.

stijnherreman commented 1 year ago

In our latest project we integrated the DB migrations in the deployment pipeline simply by generating an idempotent migration script (for all migration steps) via dotnet-ef and leveraging sqlcmd to execute it against the appropriate sql server instance.

I ended up with the same solution. The idempotent script is an artifact published by the build stage and is executed with sqlcmd.

- task: PowerShell@2
  inputs:
    targetType: inline
    script: |
      Import-Module SqlServer
      Invoke-Sqlcmd -ServerInstance <...> -Database <...> -InputFile <...>

One thing I'd still like to add is the ability to review the generated script in a pull request. GitHub has done this and open sourced their code, so I'm hoping to use their work as inspiration.

alrz commented 11 months ago

Is there any issue regarding something like "fluent migrations", that is, instead of inferring the changes from the model, you explicitly call into AddColumn(e => e.NewColumn), run sql, etc, to make a versioned migration script. I think that would address most of the pain points I've faced in the past,

ajcvickers commented 11 months ago

@alrz If you want to write a migration where you explicitly call the builder methods instead of scaffolding the starting point from the model, then you are free to do that. You can then create a script from this in the normal way.

dotnet / efcore