Electron100 / butane

An ORM for Rust with a focus on simplicity and on writing Rust, not SQL
Apache License 2.0
84 stars 12 forks source link

Rebasing migrations #91

Open jayvdb opened 1 year ago

jayvdb commented 1 year ago

One of the items in https://github.com/Electron100/butane/blob/877b65f/notes.org is "Remove timestamp from migration names". This refers to the directory name for the migration which appears in .butane/migrations/, and are named like YYMMDD_<seconds>_<migration_name>.

I don't know all of what might be intended by that org entry, but one reason for removing timestamp from the migration name is that it problematic when it comes to changing the order of pending migrations.

i.e. two PRs are submitted that both include a migration. The date is in the migration names. For the purposes of simplifying the explanation, assume PR "A" is created on Monday, PR "B" is created and merged on Tuesday to main without modification, and the PR "A" is updated and merged to main on Wednesday.

The first obvious problem is the date of the PR "A" will be before PR "B", despite the real order being the other way around. The date a migration was created is not helpful from the perspective of a migration. The order that matters is which order the migration needs to be applied in.

The more difficult problem is that updating PR "A" is hard. When PR "B" is merged, it will cause conflicts in PR "A". This is good at present, because it prevents accidentally merging PR "A" without first rebasing it on top of PR "B".

And all of this is a very basic situation - it gets much harder if there are multiple branches, such as supporting multiple versions, and various patches being back-ported to older versions in all unpredictable order.

The approach I am using is to

  1. delete the migration in PR "A" and then
  2. git checkout main .butane/migrations, then
  3. do the git rebase, and
  4. rebuild project which will recreate the migration state on top of PR "B".
  5. butane makemigration <name> to recreate the migration

This process means the date of the PR "A" migration will be after the date of PR "B", which makes the date prefix sensible again.

Can we improve this? At the very least we can create a butane_cli command that "unmake" the last migration (delete it and rollback .butane/migrations/state.json to prior migration name), which is basically step (1). The clean command deletes all of .butane/migrations/current/ , but not the existing migrations, which is step (2).

The prior migration name is available in the from_name in the info.json in each migration.

I've spent a lot of time with Django, and it is seen as quite good. We should seriously consider their migration approach unless it can be improved upon. https://docs.djangoproject.com/en/4.2/topics/migrations/

Django default a serial number as the prefix for migration names, but does - by default - include the date in the middle of the migration name. This reduces the importance of the date. The serial number is just as bad as a date, as it is common to have migrations merged out of order.

There are Django apps which help with rebasing, including renumbering migrations. IMO we can learn from those and we should try to get that type of functionality built in.

There are other migration-specific tools that do better than typical ORMs, so we can learn at lot from them. c.f. https://github.com/Electron100/butane/issues/67

Electron100 commented 1 year ago

You make good points.

  1. I agree that an unmake command in the cli would be low-hanging fruit to improve the situation.
  2. You're definitely right that there are best practices elsewhere to learn from. I need to spend a little more time digging before commenting intelligently in that regard.

Butane's migration system is loosely inspired by Django's, although the latter's is obviously far more mature.

jayvdb commented 1 year ago

The "unmake" approach is probably short-sighted. In production scenarios, often the DDL/DML to migrate needs to be hand-coded. The current butane system allows for this, as it has up/down.sql which can be edited.

IMO, we don't want to be deleting hand-coded migrations, as that is a feature needed for butane to be production-ready. reshape (https://github.com/Electron100/butane/issues/67) provides a way to avoid hand-coding DML, but that would be an optional butane feature, and I am not 100% sure reshape can entirely avoid the need to add custom DDL/DML to migrations. The Django migration system allows for custom Python code to be used for each migration, and I have needed to use that so frequently that I doubt a purely declarative migration system is sufficient.

Another approach might be give the user the ability to perform the following sequence

  1. "detach" a specified migration, so that it is no longer going to conflict. The actual migration would be kept in place, with any customisations they have done.
    1. Check the migration has not been applied with error telling user to rollback,
    2. Restore .butane/migrations/state.json to the prior state.
    3. Undo it from the embed
  2. The user is now ready to do a git rebase.
  3. "rejoin" a specified migration after they have rebased their branch. This would:

    1. ideally determine if it is feasible/sensible to put the specified migration on top of the latest migration, informing the user of logic errors that a rebase would cause. e.g. emitting errors like "The specified migration modifies table Foo which has been deleted". But this step can be ignored initially under buyer beware, and would grow over time - the developer is still in charge and should be responsible for doing this themselves. And step (4) below should catch 90% of problems.
    2. rebuild the .table files in the migration, and report any which were modified - this helps the developer see what has changed since the migration was written, which gives them a huge clue bat as to whether the migration being rebased still makes sense.
    3. Reconstruct the up/down sql and again check if they differ from the ones on disk. By default IMO the "rejoin" command should fail and have made no disk changes if they differ. Here the failure message can be friendly for some simple cases, like the new up/down SQL is empty, which means the table changes in the specified migration have already occurred on this branch. Others will be ugly, as there is only SQL to use to determine if the migration is still the same as it was prior to the rebase.
    4. Assuming (4) passed, or --force was given, rename the specified migration to use the current timestamp, write the new files, etc, etc.

When there is an existing detached migration, all commands would be disabled except for the "rejoin", with an error message informing the user how to delete the detached migration if they want to abandon it.