Tracking which migrations have been applied in the schema_migrations table

mgdelacroix commented 3 years ago

Recently we started integrating our platform with go-migrate and we discovered that the only state tracker that the tool has is the last version applied. This can be problematic for big teams (as ours) in which two migrations will not necessary be merged in sequence (chronological order).

To illustrate this, say two feature teams work on parallel with one migration each, and the team that merges first has the migration with the latter timestamp. When this migration gets applied, the other team merges theirs, and as the only check go-migrate is doing is on the last version stored in the database, the one with the earlier timestamp will never be applied.

To fix this, we've checked other systems like Rails' Active Record, and what they do is that they track every migration that they apply. We are in a position now where we could spend some time working on a patch to add this to go-migrate in a backwards compatible way, so would this be something that the project will like to include and support? If so, we're happy to come out with a pull request illustrating this and discuss the approach further.

We consider as well creating a CI check to avoid adding migrations that are older than the last version, but we believe that adding this to the tool can be beneficial not only for our use case but for other teams out there.

//cc @dhui and @nronas for visibility

mgdelacroix commented 3 years ago

To expand on this, we've checked the proposal that was made on #470, but this will not fix our use case, as a file is local to a given system, and this will not support clustered scenarios with multiple servers using the same database.

The idea we have it basically to reuse the existing migrations table without changing it. Instead of tracking only the last version applied, we would have one row per applied migration, and go-migrate would check which migrations it should apply based on those.

This should make easy to maintain backwards-compatibility for several drivers, as in the case of relational databases like mysql or postgres, migrating from the old system is just a matter of reading the last migration applied, and marking all migrations up to that point as applied creating a row for each.

dhui commented 3 years ago

Thanks for reaching out!

Let's talk about the manifest file in #470

We are in a position now where we could spend some time working on a patch to add this to go-migrate in a backwards compatible way, so would this be something that the project will like to include and support? If so, we're happy to come out with a pull request illustrating this and discuss the approach further.

🎉 Any help is greatly appreciated!! Let's settle on a design before starting any implementation.

nronas commented 3 years ago

@dhui thanks for the quick response on this. I've brushed off on the implementation in the manifest proposal but basically, the design that we were thinking is what @mgdelacroix described:

To fix this, we've checked other systems like Rails' Active Record, and what they do is that they track every migration that they apply. We are in a position now where we could spend some time working on a patch to add this to go-migrate in a backwards compatible way, so would this be something that the project will like to include and support? If so, we're happy to come out with a pull request illustrating this and discuss the approach further.

In a nutshell, instead of using 1 row to determine which migrations have been applied, we will be writing many migration rows in the schema_migrations table, one for each applied migration. Bellow, i demonstrate an execution flow:

User A creates in their dev cycle a migration t0_migration_name.up.sql. Another user B creates also in their dev cycle a migration t1_migration_name2.up.sql, were t1 > t0.Let's assume that the state of the schema_migrations table is empty:

User B gets their PR approved and merged to master, migration is applied to prod and the schema migration has

    version     | dirty
----------------+-------
t1              | f

User A updates their branch from master without changing anything on their side, they merge their PR and they apply the migrations to prod. Now the table looks like:

    version     | dirty
----------------+-------
 t1             | f
 t0             | f

Every time that we want to apply migrations, we will compute the difference between the migration files from the source and what is in schema_migrations(essentially file names from the driver) and proceed on applying them in sorted order.

What needs to change

The Driver interface needs to change in order to allow the ability to retrieve the already applied migrations, we were thinking something along the lines of functions like AppliedMigrations(), LastMigration(), SetMigrationApplied().

The interface changes will not break any compatibility. Since the proposed design is a superset of the existing implementation.

Data Backwards Compatibility

That should be the easiest step. We can add a piece of logic when someone is Up-ing from an old schema version using the new client, by checking the shema_migrations table, check if it has only 1 row, if true get the version from that row and insert version rows up to that version.

Existing project with the single row mechanism

// Source

t1_name.up.sql
t2_name.up.sql
t3_name.up.sql

// SchemaMigrations

    version     | dirty
----------------+-------
t3              | f

Updated project with the multi-row mechanism

After Up is called then the SchemaMigrations will be like, without actually performing t1 and t2.

// SchemaMigrations

    version     | dirty
----------------+-------
t3              | f
t1              | f
t2              | f

We would ❤️ to hear your thoughts on this.

dhui commented 3 years ago

Your example allows for dangerous behaviors. e.g. applying migrations earlier than the last successfully applied migration What if user A's migration dropped a column from a table and user B's migration renamed the same column in the same table?

jeremieweldin commented 3 years ago

It works beautifully in ActiveRecord and liquibase. The situation you stated comes up rarely, but if they do, the migration fails in a verbose way so that you know what happened, can rollback, correct the migrations and re-run them. This type of functionality is absolutely crucial for team development.

dhui commented 3 years ago

The situation you stated comes up rarely, but if they do, the migration fails in a verbose way so that you know what happened, can rollback, correct the migrations and re-run them.

Rarity isn't the concern, safety is. You can't rollback dropping a column since data is lost.

This type of functionality is absolutely crucial for team development.

What functionality are you referring to? e.g. detecting and preventing branching/reordered migrations?

dhui commented 3 years ago

This issue duplicates https://github.com/golang-migrate/migrate/issues/179

Leaving this issue open for now until open questions have been answered.

New questions/discussions should be held in #179

dhui commented 3 years ago

Also duplicates https://github.com/golang-migrate/migrate/issues/65

nronas commented 3 years ago

Your example allows for dangerous behaviors. e.g. applying mtgrations earlier than the last successfully applied migration What if user A's migration dropped a column from a table and user B's migration renamed the same column in the same table?

@dhui I do not see this as a migrations issue TBH, let me elaborate,tthe possible permutations that i can see are the following:

User A creates a migration with timestamp X, migration drops column C on table T
User B creates a migration with timestamp Y, migration renames column C on table T
X > Y chronologically

Scenario 1) A merges first than B

Migration is applied in the schema which drops the column C
B rebases from master, runs migrate locally, column C is now deleted from their local schema, logical flows that rely on that column should fail. To me that is a logical issue that a migration system should not support, maybe go as far as warning the user but at the end of the day, in the above scenario, someone deleted a column that you wanted to rename, is a logical development issue not a migrations issue IMO. it's similar to when you branch off from master and start using an endpoint X and then someone deleted that endpoint after merging their PR on master after you branched off.

Scenario 2) B merges first than A

Migration is applied in the schema which renames column C
A rebases from master, runs migrate locally, get's an error since the pending migration that was just got pulled from master renames a column that does not exist. The developer needs to update/figure out what to do with it.

Flipping X < Y the above scenarios still stand valid.

In all scenarios, i think the problem is around "logical" schema development, which relies almost entirely on the team that implements the logic for their features/services.

NOTE From the above analysis we can infer that migration timestamps do not affect migration validity, since the crucial component for a migrations tool is the time when a migration is applied.

Am i missing any scenarios here?

lucasdss commented 3 years ago

Your example allows for dangerous behaviors. e.g. applying migrations earlier than the last successfully applied migration What if user A's migration dropped a column from a table and user B's migration renamed the same column in the same table?

The same issue can happen when the previous migration deletes a column, and the next one wants to rename it. If it's a merge from two different branches, everyone could have tested it, and it was working as intended. As I see, it's not a migration tool's responsibility to fix this problem.

The team I'm working on is having issues merging files from different branches. When the first merge brings a file with the newest timestamp, the migrations are applied perfectly. But the second merge, with a file with the lowest timestamp, is not applied.

I believe the proposed solution is a common pattern applied by other migration tools/libraries in other languages.

Thank you very much for the amazing tool/library, but I would love to see this feature implemented.

dhui commented 3 years ago

@nronas I follow the scenarios you've laid out. Any fix will need to be done by developers. But why wait until integration testing to catch migration ordering issues when these issues can be caught earlier. Schema changes should always be made against the latest "version" of the schema to prevent issues and detecting and enforcing this can be done early in the software development lifecycle. You referenced Django, which also linearizes migrations.

@lucasdss

If it's a merge from two different branches, everyone could have tested it, and it was working as intended. As I see, it's not a migration tool's responsibility to fix this problem.

Correct, but migrate can detect and prevent this from happening using a manifest file. The actual fix is the responsibility of the developers.

I believe the proposed solution is a common pattern applied by other migration tools/libraries in other languages.

Yes, but that doesn't mean it's best approach for migrate and that migrate can't improve the developer experience and prevent issues from occurring

agnivade commented 3 years ago

IIUC, there are two distinct problems that we are trying to solve here.

is that if two commits merge from master, and have different timestamps, and the latter one gets merged first, then the commit with the earlier timestamp will silently fail to get migrated.
is to solve the case of what happens when two concurrent commits merge conflicting migrations into master.

The proposed solution solves #1 by tracking all applied migrations, instead of only the last one, and applying the delta, but it leaves #2 as a responsibility for the user to fix.

Now if I am reading the manifest proposal correctly, is it an attempt to follow Django's model of modelling the migration history as a digraph and then throwing an error if the graph has conflicts? I am wondering if we actually need a manifest file for this. Will it be possible to store the digraph along with all the node hashes in the DB itself? I think something like that was proposed here. And then build up the graph from the filesystem, compare with what's in the DB and then the migrator can choose to do whatever it wants. It can throw an error if the graph is not linear, or apply topological sort and run migrations for a power user, or anything else for that matter.

Just to clarify, I have no opinions on either solutions. Only trying to understand the problems, and the proposed solutions.

dhui commented 3 years ago

@agnivade

Now if I am reading the manifest proposal correctly, is it an attempt to follow Django's model of modelling the migration history as a digraph and then throwing an error if the graph has conflicts?

Sort of. Django is quite different in:

Django is an ORM so the migrations don't need to be raw SQL and it can be aware of the migration operations; where as migrate treats migrations as opaque blobs.
Django's migration format allows it to track the parent of a migration; whereas migrate relies on the lexicographical ordering of the migration names so tracking a tree structure is non-trivial and provides no value.

But at a high level, the manifest file proposal aims to detect and prevent migration ordering issues as early in the development lifecycle as possible.

Will it be possible to store the digraph along with all the node hashes in the DB itself?

What's the purpose of storing the tree structure in the DB? The migration source is the source of truth of what changes need to be applied to the DB, so any tree structure would need to be stored in the migration source if we were to go down that path. The current manifest file proposal would prevent any branching from occurring so it's stores a "linked list" instead of a "tree" and provides tooling to "rebase" (e.g. reparent) your migrations

agnivade commented 3 years ago

Understood. So essentially, you are enforcing a linked list structure in the migration source, and whenever you detect a divergence you throw an error. This would be before even applying the migrations, so the DB can still maintain the last applied sequence number as its only state?

dhui commented 3 years ago

@agnivade correct

nronas commented 3 years ago

But why wait until integration testing to catch migration ordering issues when these issues can be caught earlier.

@dhui :100: get the point/concern raised, but let me elaborate more on the rationale and background context of the proposal because i think we weren't super clear on that and we apologize for this.

"Customer" analysis/ "Market" research

We believe that a tool for managing(developing, applying, etc) migrations on a project across many team members, needs to be accurate, safe, and as frictionless as possible. We identified two different types of migrations:

1) Migrations that do not depend on anything. non-linearizable migrations(they can be performed in any order). 2) Migrations that do depend on some other migration. Require serialization in order to achieve accuracy and safety.

Based on experience and a bit of research on issues raised on this project and others, category 1) accounts for the 99% of migration types out there. You can defo call this semi-anecdotal, but that's all we have at the moment.

The proposed solution here, solves category 1) and put's the burden of fixing category 2) to the user. We definitely believe in the manifest solution, based on our understanding of the manifest proposal we are thinking of it more as a feature not as the norm, since if it's established as the norm then migrations of category 1) will become super noisy for no added value.

Consider a scenario that two people are adding 2 migrations that are completely independent with each there, let's say creation of two different tables. The first migration with its digest gets merged to master and all is well. Now the other person pulls the new manifest and they get a conflict, they have to re-compute and commit, but those migrations are perfectly fine to be applied as is. So for the "canonical" use case a manifest as an enforcing function, to me, it sounds a bit too friction-full.

The idea from our point of view for this proposal/effort, is that we start by solving for category 1) and then in future iteration we introduce a "dependent migrations" through a conflict detection mechanism as an optional feature, like you can annotate your migration files(with a comment or something) and the tool will build a dep-gram and perform a conflict detection.

The proposal we believe fixes 99% of the cases that we know for a fact, based on issues raised by many people, without breaking compatibility and without "vendor-locking" us on an approach that we cannot easily change in the future. Also, the proposal for the remaining 1% is not reducing accuracy, it just puts the burden on the devs. In a sense is a low-hanging fruit that has a huge impact.

We will love to find a common ground here, since we believe a hybrid approach of multiple-row tracking + dependent migrations feature will serve us for a long way, without introducing unnecessary friction.

You referenced Django, which also linearizes migrations.

Totally, but i think that wasn't in the initial implementation of Django's migrate tool, see here. The annotation on the migration file that allows the user to declare dependencies was later introduced if my memory serves me well here and is also optional. Excuse, if my context is not up to date, but is being a long time since the last time that I've worked with Django.

hermanbanken commented 1 year ago

Based on experience and a bit of research on issues raised on this project and others, category 1) accounts for the 99% of migration types out there. You can defo call this semi-anecdotal, but that's all we have at the moment.

Falling in that 1% (which I do believe to be too low an estimate) is sad.

I used to love working with Play Framework migrations, which have always stored the SQL of the migration applied (up+down) in the database itself. This allows you to have a "tree" (as in Datastructure) of many serialised change paths in progress by different developers, while having the database correctly roll back/forward with each push to the development environment.

Note that this is for the development kind of workflows, not production, where you often do not want to apply downs. Downs are mostly a thing for disaster recovery rollback and development only.

The "manifest file" solution has the problem of being a state attributed to a single version. The database can be in an entirely different state (or branch of the migration tree). In order to backtrack / forward you need each applied migrations down available at all times.

abezzub commented 1 year ago

As a data point, not tracking all database migrations is a dealbreaker for our team and we will have to use another tool instead of migrate or write our own. Ease of development process > an issue with out of order migration that is unlikely to happen and would be caught by CI anyway.

fatum commented 1 year ago

As a data point, not tracking all database migrations is a dealbreaker for our team and we will have to use another tool instead of migrate or write our own.

the same feelings

sourcec0de commented 11 months ago

@abezzub what tool did your team switch to?

abezzub commented 11 months ago

@sourcec0de we are using goose and that works well enough for our use case

golang-migrate / migrate