dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.5k stars 3.13k forks source link

Squash migrations #2174

Open rowanmiller opened 9 years ago

rowanmiller commented 9 years ago

It would be good to have the ability to squash several migrations into a single file to help reduce the number of files in a project.

We probably want to keep track of the original list of migration names so that we can reason about this when targeting an existing database that the original migrations were applied to in their un-squashed form.

popcatalin81 commented 9 years ago

@rowanmiller how will squash work? Merge Migrations into one or simply place them in a single file.

I think merging will be problematic when a target database has partial migrations but not all from a squash.

bricelam commented 9 years ago

@popcatalin81 I suspect at first, it will simpy concatenate all the operations together into one migration. In the future, it may try and simplify the operations (e.g. renaming A -> B -> C will become just A -> C)

Correct, "rewriting history" is always a bad idea. Before squashing, you'll have to revert all the migrations you want to squash, squash them, then re-apply the new one. You shouldn't do it if the migrations have been applied on any database other than your local one.

This operation would be useful while developing a new feature. You could add all the migrations locally you want, but before merging your feature, you could squash them all down into a single migration.

TheMadKow commented 8 years ago

:+1: For this idea

markusvt commented 8 years ago

I just wanted to suggest that idea also.. the migrations folder gets quite large quite fast if the projects develops over time

dario-hd commented 7 years ago

I was wondering if removing them all and creating an "initial migration" would be a better approach. In the end this is what I did recently. Of course the "initial migration" should be executed only on database creation. This will not only reduce the number of files in the project but it will also speed up the initial database creation if you recreate it multiple times e.g. for development and testing purposes. What do you guys think? @rowanmiller @bricelam

pgrm commented 6 years ago

I'm puzzled that this issue is so inactive. How are others solving the issue of the ever growing Migrations folder?

Could there be at least some best practice described in the documentation @AndriySvyryd ?

bricelam commented 6 years ago

@pgrm I briefly mentioned a strategy in the Migrations docs I'm adding...

replaysMike commented 6 years ago

This is a common problem we run into every once in a while. It's quite simple to accomplish. If your database is already up to date just delete all the migration files and truncate the dbo.__EFMigrationsHistory table. Generate a new initial create migration and you have now squashed all your migrations. You lose any comments but that's minor if you're needing to do it.

PMExtra commented 5 years ago

@replaysMike It will lost my custom migration operations. (For example, I set a custom default value for a new field.)

replaysMike commented 5 years ago

@PMExtra that’s surprising since you’re basically creating a migration based on the current state of the database. Is the default value being applied at the db level, or code level when the entity is created?

deanvr commented 4 years ago

Is this still open or closed? We often end up with way too many migration files in VS solution explorer, to resolve this its a pain in the... we manually run the migrations on a clean database, take the schema and data inserts and extract those and literally create a new migration that becomes the new seed, this way we don’t loose history and don’t have to truncate the migration table. What would be really cool is if we had a command that would reverse engineer a seed migration from a given database , this would save us a couple hours every month or so. Thank you - let’s make this happen!!!!!

Btw database snapshots is a little overkill?? Might have some interesting application though??? I’m thinking integration testing?

snebjorn commented 3 years ago

I usually manually "squash" migrations into new InitialCreate when a new system goes into production, removing all the develop time migrations as they usually aren't very helpful to reason about model changes.

But changes to the model after the system is in production are very important.

I don't think the squashed migration should just be a concatenation of the existing migrations, as one might add a column/table and another might remove it again, it's just noise. It should be a fresh InitialCreate migration. Nice, clean and ready to run on the production environment without any develop time noise.

But this is already possible today, with possible data lose on the existing DEV environments.

I'd welcome a squash feature to make this easier and avoid data lose.

roji commented 3 years ago

@snebjorn if your goal is to squash development-time migrations which haven't yet been applied to production, it's pretty easy to simply reset your Migrations folder and then generate a single new one, which would contain all the changes - you can do that before merging your change.

yakeer commented 3 years ago

Any plans to automate this in any release? The manual squashing is really frustrating. And at some point, Azure DevOps is failing due to huge migrations dll.

ajcvickers commented 3 years ago

@yakeer This issue is open and on the backlog, which means that we do intend to implement this in a future release. See release planning for more details.

clement911 commented 3 years ago

The latest VS release is extremely slow to compile migrations for example so this squashing might help. See https://developercommunity.visualstudio.com/content/problem/1253845/visual-studio-2019-version-1680-very-slow-to-build.html

roji commented 3 years ago

@clement911 that issue should hopefully get resolved really soon on the VS side, so shouldn't require waiting for migration squashing in EF.

RCTycooner commented 3 years ago

@clement911 same issue here... going to attempt to manually squash my migrations now.

clement911 commented 3 years ago

@RCTycooner this was fixed in the latest VS release 16.8.2

RCTycooner commented 3 years ago

@clement911 what's the status of issue for build pipelines? We're using hosted agents that should be getting the latest versions of everything, but still has this issue. Will update my VS to 16.8.2 and try a local build anyway

domagojmedo commented 3 years ago

Would squashing preserve any custom code that was added to the migration?

MikaelEliasson commented 3 years ago

I recently squashed a lot of migrations in our project (20+ person dev team). I hope this summary can help you or someone else with this need. See https://www.bokio.se/engineering-blog/how-to-squash-ef-core-migrations/ for more details + scripts we used.

The largest hurdle I found doing this is that the migrations snapshots are actually "corrupt" in EF Core under certain circumstances. If you need me to dive more deeply into this just ping me.

Extract from the blog post:

This can happen when your team create migrations in parallel on different branches a basic flow is something like this.

Version Parent is on the dev branch.
Version B is branch from Parent and adds a migration.
Version C is branch from Parent and adds a migration.
Version D is B & C merged back into dev.

If we look at the global snapshot: 
B contains Parent + B
C contains Parent + C
D contains Parent + B + C

This is great. The problem is that no migration snapshot contains Parent + B + C. 

So if you pick B or C as you snapshot you will lose some information. Most of the time this is not a huge issue but EF uses these migrations snapshots to decide specificity of columns and if that needs to update. So I suddenly saw issues where on column had varchar(MAX) in the new version of the database but varchar(50) in the old one.

The issue this caused for me when squashing migrations was that I never could find a migration with a good migration snapshot to base it on.

So instead I had to take one of two approaches:

If you find a solid solution for this issue squashing would actually be decently easy to script.

Dasein732 commented 3 years ago

What I end up doing most of the time is delete the migrations and re add them to a fresh DB(almost identical to one of the articles mentioned in this thread, seeing how multiple people have a similar approach might be useful path to pursue) and then regenerate a single 'initial migration' script. One thing that would be extremely valuable in this situation would be to have a script that would do this for me(maybe have it optionally target an in memory provider for simplicity sake, since in 99% we're targeting a fresh DB instance) and insert a sort of a history breakpoint between different migrations in the Up()/Down() methods.

For example if we had 2 migrations before the squash, InitialMigration and AddCustomEntity, we'd have a migrationBuilder.InsertHistory("AddCustomEntity") in between those 2 migrations in the newly merged file and this would(optionally) trigger a transaction commit for previous migration.

This would be extremely useful, since the various deployments we cover usually have partial migrations applied due to various factors and we'll prolly never be able to bring them all to same verison.

worthy7 commented 3 years ago

Just an idea but would this work?

  1. Delete all migrations in the project, essentially reset
  2. Create just the init migration
  3. generate the idempotent SQL script
  4. in the database, remove everything in the MigraitionsHistory table
  5. Run just the last part of the SQL script, which should add the one and only migration info to the MigrationHistory table

If my understanding is correct, this should just work right?

gojanpaolo commented 3 years ago

@worthy7 That probably won't work as expected if there were custom logic added to the migrations.

https://github.com/dotnet/efcore/issues/2174#issuecomment-482414336 It will lost my custom migration operations. (For example, I set a custom default value for a new field.)

worthy7 commented 3 years ago

Ah, of course. I knew I was missing something.

MikaelEliasson commented 2 years ago

Just to leave a small update on my answer above. https://github.com/dotnet/efcore/issues/2174#issuecomment-760022208

We ran this for a second time. And saved 1.8 million LOC (73MB of source code). This time it took about 60 min work to squash all the migrations, validate that the db was identical + add the test below.

We did run into a tiny issue though. The prep migration we created last time was included in the new initial migration. And it doesn't work in that context (duplicate key). We just commented it out.

image

To prevent any team mate merging an earlier migration from a long lived branch we also added a small test that will block that in the PR tests.


        [Fact]
        public void NoMigrationShouldBeOlderThanSnapshot()
        {
            /**
             * When we squash changes we need to make sure no migrations before the squash is merged in.
             * 
             * If you have a migration like that you need to remove it on your branch and re-run the add-migration
             * so you get a newer timestamp. (And please merge newer changes into your branch first too) 
             */

            var currentActiveSnapshot = "20210826073610_Squash2_prep";

            TC.New();
            using (var context = DB.NewUnsafe())
            {
                var migrationsAssembly = context.GetService<IMigrationsAssembly>();

                var earlierMigrations = migrationsAssembly.Migrations.Where(x => x.Key.CompareTo(currentActiveSnapshot) < 0).Select(x => x.Key).ToList();
                earlierMigrations.ShouldBeEmpty("There are migrations that are older than latest squash.");
            }
        }
voroninp commented 1 year ago

image

begerard commented 1 year ago

Even a partial support witch would cover only the simple cases where there is no custom code required for the initial initialization would be useful. Like an automation of the workflow described by worthy7. Giving us no automated solution even for simple cases make it always error prone, and make it by itself a defect in EF Core IMO.

roji commented 1 year ago

Just to be sure everyone is on the same page, especially for the simple cases with no custom migration code, there's a very easy workaround documented here; this is obviously not as perfect as a built-in squash feature, but the fact that a good workaround exists makes this less urgent.

The main problem IMHO is what happens when there is custom migration code, which gets lost if you simply delete the Migrations folder. In other words, I think there's very little value in implementing a "partial" squash feature which doesn't handle the custom code scenario.

In any case, this is feature which we definitely intend to implement at some point; it's simply a matter of competing priorities compared to other issues, which may not have a workaround as above. It's also pretty highly-voted, so there's a good chance we'll get to it sooner rather than later.

worthy7 commented 1 year ago

I think tbh, this is probably something for the CLI. And in that case, it will just be a wrapper single function around the 6 steps I wrote above - with a warning that it doesn't care about custom code migrations 😂

begerard commented 1 year ago

Just to be sure everyone is on the same page, especially for the simple cases with no custom migration code, there's a very easy workaround documented here; this is obviously not as perfect as a built-in squash feature, but the fact that a good workaround exists makes this less urgent.

The step where we delete/insert on the historical table ourself is not great. Time consuming, error prone (there is many way to do it)... Having a simple command "dotnet ef migrations reset" that handle the files removing/migrations generation (if that's a good way to do it) would be a very useful first step.

MikaelEliasson commented 1 year ago

Deleting all migrations is not a good workaround, it will be painful in any larger team/codebase to try that. Handling the merging of migrations coming from different branches will be really time consuming and error prone.

The approach I've written about is not perfect or super quick. But it handles both custom code migrations and a team environment. So there is workarounds even if none is perfect.

If people go with the "simple" solution they should really artificially decrease the timestamp on the new migration a lot. That way you can at least avoid any merged migrations getting a lower timestamp and run before the initial migration as that will break things in a majority of cases. (And the documentation should probably include this warning too)

roji commented 1 year ago

One more thing to consider for this is mobile devices, where there's a (Sqlite) database on each device. Squashing is problematic, since databases are at various migration states, and you can only squash once you're none of your databases is in a migration state that's going to get squashed.

So squashing may be irrelevant for these scenarios. Another possible solution is to have a squash "archive" or some sort, which is somehow used only when a migration state in the database isn't found in source code. That would allow the main project to squash and be clean, but still allowing devices on old migration states to move forward. But this isn't trivial.

stap123 commented 1 year ago

@roji Is that a common scenario for people?

If the "normal" scenario would benefit from this as it seems it would is it a suitable option to add something and just advise that it's not suitable for all workloads/scenarios? (not sure what the teams general feeling on things like that is) 😄

roji commented 1 year ago

@stap123 there certainly are usages of EF on mobile devices out there. The point of my comment was mainly to keep these in mind when designing for this feature - we may end up saying it's not supported/recommended for mobile devices at first, possibly provide a solution later, etc.

jeff-pang commented 1 year ago

Other than snapshot issue explained above by @MikaelEliasson

Using migration remove to preserve valid snapshots and then squashing migrations runs the risk of data loss when renaming and dropping columns in the same table. Example:

  1. Initial Migration - migration add InitialCreate` (applied to database)

    public class Customer
    {
    public string Id { get; set; } = null!;
    public string? FirstName { get; set; }
    public string? LastName { get; set; }
    }
  2. Drop LastName - migrations add DropLastName (pending)

  3. Rename FirstName to FullName - migrations add RenameFirstName (pending)

At this point we have InitialCreate applied to database and DropLastName, RenameFirstName pending which will be correct when applied to database.

But if we squash DropLastName and RenameFirstName with migrations remove and then recreating the migrations (e.g migrations add RefactorCustomer) we end up with migration which will drop and rename the wrong columns

 protected override void Up(MigrationBuilder migrationBuilder)
        {
            migrationBuilder.DropColumn(
                name: "FirstName",
                schema: "sample",
                table: "Customers");

            migrationBuilder.RenameColumn(
                name: "LastName",
                schema: "sample",
                table: "Customers",
                newName: "FullName");
        }
jasekiw commented 1 year ago

@ajcvickers I believe more priority needs to be added to this task. Many of us have a very large amount of migrations that are causing our build times to be quite unreasonable slowing down our development speed. Simply deleting all migrations and recreating breaks several things: column order, custom migration adjustments, and more as mentioned by others above. Can this be added to the table of discussion for 8.0 or even 9.0 and commit to it? It has been punted several times.

aradalvand commented 1 year ago

@ajcvickers @roji Hey guys, would you please consider this for 8.0? It's one of the oldest and also most upvoted feature requests in the entire repo.

Migration bloat tends to become an increasingly big pain point the larger a project gets.

MikaelEliasson commented 1 year ago

One possible idea that doesn't solve the problem for old code bases but might make it easier to handle in the future.

What if we could make Add-Migration also write a list of the migrations that was there when you ran the command? It's a very small change and even something you can implement as you own script that wraps Add-Migration. The result could basically be a plain text file next to the migration file with all the migrations listed row by row.

So why would that help? It would make it a lot easier to rebuild the actual migration graph and to understand what we might be missing in the snapshot for MigrationX. Being able to understand the whole migration graph was the biggest challenge I had when trying to make my approach fully automated.

If you know what's missing it seems quite managable to rebuild a correct first snapshot.

The same files/data could also likely be used to avoid the fact that Remove-Migration is completely broken in a team environment and must never be used today (because of the out of sync migration snapshots i creates a corrupted main snapshot 90% of the time). With that data it would likely be possible to figure if some migration is missing and maybe even how to handle it.

roji commented 1 year ago

@MikaelEliasson there's already dotnet ef migrations list for listing migrations. Note that there's no migration "graph" - it's just a list.

inlineHamed commented 1 year ago

This is my routine to keep build time short: Once a while I create an empty database and run the migrations to a desired point, then use SSMS to generate sql file (schema and data) from the database, and put it in migration folder, then move those migrations file into a zip file in migration folder. Any new team member can just run the sql file instead of the migrations in zip folder. This is in addition to separating migration project and make it unloaded in VS. whenever I need to add migration or update database, I will use cli to build migration project and do so.

ajcvickers commented 1 year ago

@jasekiw @aradalvand This is high up on our priority list, but unfortunately it is not trivial and requires expertise in the Migrations code. There are also other areas that for internal and strategic reasons need resources at the current time--see Release planning process for a general discussion of this. This means that its unfortunately not something we can work on in the 8.0 timeframe.

dragnilar commented 1 year ago

@ajcvickers That's unfortunate, but understandable. I know it's too early to make any promises, but here's hoping we can see it for 9.0. 🥺🙏😇

MoishyS commented 1 year ago

I think most of the bloat is from the Designer files, and I don't understand why it's needed.

is it only for removing migrations to recreate the main snapshot? can we safely remove those Designer files if we're not using the migrations remove? can we have a flag to not create them?

ajcvickers commented 1 year ago

@MoishyS I discussed this with the team. The designer files are frequently needed when executing migrations to obtain information from the underlying EF model. While in some cases it could be safe to remove them, this won't be the common case. It's also the case that a designer file not used by a migration in a given version of EF may later make use of the designer file as new features are implemented and bugs are fixed. Therefore, it doesn't seem like removing the designer files is a good way to go here.

ajcvickers commented 1 year ago

@AndriySvyryd @bricelam Do we have an existing issue for using the relational model here? I couldn't find one.

AndriySvyryd commented 1 year ago

@ajcvickers https://github.com/dotnet/efcore/issues/18620 😄

roji commented 1 year ago

That point in a project's life where everything has already been discussed...

MoishyS commented 1 year ago

@ajcvickers Thanks for your reply.

The designer files are frequently needed when executing migrations to obtain information from the underlying EF model.

I am trying to understand the case where you would need the snapshot for the migration, coming from Laravel all you have is the migration without any snapshot. this behind-the-scene magic seems wrong to me, the migration file should include everything to create the sql without relying on anything else.

I understand that you need the snapshot history to remove migrations, but if someone does not use migrations remove (as we have everything tracked in git, and can reverse the main snapshot) we shouldn't need these huge designer files, and we shouldn't rely on it when executing migrations.