contao / core-bundle

[READ-ONLY] Contao Core Bundle
GNU Lesser General Public License v3.0
122 stars 57 forks source link

The Future of runonce.php #1223

Closed loilo closed 4 years ago

loilo commented 6 years ago

The Future of runonce.php

In the process of transitioning away from classical Contao Modules to Contao Manager Plugins, there should be a discussion about what will happen to the good old module installation hook — the runonce.php.

So this is my take on the topic.

What is the gist of the runonce.php?

Having a runonce.php is — in Composer terms — a post-package-install / post-package-update script that is executed on a per-package basis. It's all about lifecycle handling.

Do we want to extend the support of lifecycle hooks?

I've personally been missing a feature contrary to the runonce.php: a script that will be executed after uninstallation, to do cleanup work for changes the runonce.php may have introduced.

I've briefly discussed this option with @ausi but we didn't quite agree on the question if this is actually a thing Contao should have. For me that would've been great have from time to time, but I've used Contao modules for a bunch of rather "meta" work (for example managing git hooks) with side effects beyond the things Contao modules usually do.

Since one of the Managed Edition's goals is to take away a bunch of Symfony's very explicit nature — and thus to make plugin management more user friendly — I think a way to do cleanup on package uninstallation would be useful.

Module updates from one version to another have always been achievable by some clever code in the runonce.php. However, that's not the greatest of developer ergonomics.

Solution

Design

I've put some thought into how lifecycle hooks could be implemented and came to the conclusion that an incremental migration approach (as used e.g. in Laravel) may be the best bet.

It could be achieved by a new PluginInterface — let's call it MigrationPluginInterface.

namespace Contao\ManagerPlugin\Migration;

interface MigrationPluginInterface
{
    /**
     * Allows a plugin to perform migration tasks
     *
     * @return array    $migrations
     */
    public function getMigrations();
}

The interface has one method, getMigrations(). It should return an associative array mapping semantic versions to classes implementing the MigrationInterface:

[
    '1.0.0' => DoSomeTask::class,
    '1.2.0' => DoAnotherTask::class
]

The MigrationInterface itself is inspired by Laravel's Database Migrations and requires implementation of the up() and down() methods, performing the migration and a corresponding rollback:

namespace Contao\ManagerPlugin\Migration;

interface MigrationInterface
{
    /**
     * Perform a migration task
     */
    public function up();

    /**
     * Roll back the migration task
     */
    public function down();
}

Naturally, to work as expected, all semver => MigrationInterface mappings would have to be immutable once published. It would be on the developers to follow that rule since I don't see a way to enforce this, but I think that's okay.

That immutability could see exceptions, for example in x.x.x-alpha and other prereleases, but that's not a thing to decide in the initial draft.

Execution

After installation of a Contao Plugin implementing the MigrationPluginInterface, all its migrations will be executed in order of their versions.

After updating a Contao Plugin, only its migrations for versions greater than than the previously installed one are executed.

When uninstalling a Contao Plugin, all its migrations are rolled back.

Feedback

Of course I'm curious to hear your opinions regarding the topic and the suggested solution.

I'd especially be interested to hear how your modules use the runonce.php and if you'd ever been in need of a cleanup hook.

aschempp commented 6 years ago

I like the idea in general, binding it to the Manager plugin. Three thoughts:

  1. we should use Doctrine Migrations, not our own implementation
  2. There's no need for versioning keys in the array, we just need a list of migrations
  3. How do you want to achive downgrades? If a package is deleted, it's files will be gone?
loilo commented 6 years ago

The terms "source version" and "target version" used below refer to "the package in the version that is currently installed" vs "the package in the version that we want to migrate to".

  1. Sure. To be honest, I'm not proficient enough in the ecosystem to be aware of existing migration solutions.

  2. (Immutably) versioning migrations would allow us — with knowledge of source and target version numbers — to only execute the migrations necessary to get from one version to the other.

    Without that mapping, I'm not sure how we'd detect which migrations to execute:

    1. Roll back all migrations of the source version and then execute all migrations of the target version.

      I don't think this would be a good approach since (depending on the kind of migrations performed) it can create a loss of data.

    2. Diff the migration lists of source and target versions and execute what's needed.

      I would be all-in for that approach if it's technically viable. I haven't considered it so far because It would require access to the target version's migrations list before the source version is removed.

    Or did I miss another approach to achieve this?

  3. I'm not sure I'm understanding your question correctly. Assuming it's about "how to roll back migrations if the list of migrations has just been erased from the harddisk": We should be able to execute lifecycle hooks just before deletion via Composer's pre-package-uninstall hook, shouldn't we?

aschempp commented 6 years ago

(Immutably) versioning migrations would allow us — with knowledge of source and target version numbers — to only execute the migrations necessary to get from one version to the other.

Doctrine Migrations takes care of that by storing the already executed migrations in the database. So the database knows in which state it is. Doctrine Migrations does not care about version numbers or anything similar, it just stores the class name (sort of). By default it generates a class name with the current timestamp, anything as long as it is unique (which is more of a problem when using bundles).

We should be able to execute lifecycle hooks just before deletion via Composer's pre-package-uninstall hook,

Good point, maybe we can do that.

loilo commented 6 years ago

Okay, if we can get Doctrine Migrations to handle rollbacks/updates on a per-bundle basis, that sounds very reasonable.

aschempp commented 6 years ago

The only problem I currently see is that Doctrine Migrations are run inside the Contao application. And composer events (package install/update/remove) are not. But I guess we'll find a solution for that.

Toflar commented 6 years ago

BTW, it was not integrated in Doctrine for many reasons. Try searching for "bundle doctrine migrations" :-) But I guess most of the downsides do not apply to us. For regular doctrine migrations, you have to execute doctrine:migrations:migrate so I think it would be a good idea to just use the same process. Like contao:migrate that internally asks all the bundles in our order and then at the end the application level ones. I don't think it's a good idea to try to hook into the doctrine process. I think it's better to provide a standalone alternative that uses doctrine migrations under the hood.

backbone87 commented 6 years ago

i dont think doctrine migrations is suited as a runonce replacement.

loilo commented 6 years ago

However, both of these points are why I explicitely asked for feedback on the current usage of developers' runonce.php files. It will be important to know the use cases before making a move into any direction.

backbone87 commented 6 years ago

Idempotency is a debateable point. Browsing GitHub, I've seen a non-zero amount of runonce files that essentially are just migration files from one version to another. They include a check for the current version or state of the module and a resulting action (or no action). In that regard, idempotency of runonce files is only as good as their implementation. That's why I don't think there's a significant difference in robustness in the way module authors implement migrations. But we may have more robustness by completely discouraging developers to create any such irreversable changes by just not providing a way for reversion — that's a very valid point.

idempotency has nothing to do with reversibility. it just means that i can apply the same runonce over and over again, without changing the (initial result). when we use a little bit of math to describe this, where f is our runonce and X is the initial state and Y the final state: Y = f(X) = f(f(X)) = f(f(f(X))) = f^n(X); for all n of N That is an implicit requirement of the current runonce implementation and it eases runonce handling, because you can just drop in the runonce again, if the previous process ran into some race conditions or did abort, or if you reinstall the extension. also there is no need to track which runonces are already executed, since you can just re-execute all runonce in the correct order and get the same final state as when you would just execute the "new" ones.

loilo commented 6 years ago

I know what idempotency is, I just communicated my point poorly – sorry for that.

What I tried to say is: idempotency is given for runonce scripts because of the way they are implemented.

A theoretical example: a runonce script should add an item to a list.

The algorithm will be like:

  1. Check if item is already in the list.
  2. If not, add it to the list.

The script is idempotent, but it only is because we do a check in step 1. In other words, idempotency depends on the quality of the implementation.

What I wanted to point out is that, vice versa, if a module's migration script in a properly designed migration system is not idempotent, it's because of poor implementation on the module author's side — a factor that should apply to runonce files and migration scripts in the same way.

Or am I completely missing something important?

backbone87 commented 6 years ago

i agree that it is an error to not fulfill the (implicitly) required idempotency, when writing runonce scripts.

also if an extension has 2 runonces f (first) and g (second) with X being the initial state and Y the final state, then to fulfill idempotency the following MAY NOT hold: Y = g(f(X)) <-/-> Y = g(X). that means you still need to apply f at least once before applying g. this is ensured by all composer driven runonce implementation ("ER driven" runonce implementation dont support multiple runonce), because they use the definition order of runonce files (in the composer.json) and execute them in order. with other words: all runonce files are treated as one runonce that is the result of concatenating the single runonce files in definition order: a = f ◦ g

but i still dont get your point? bad code is bad, like said, but that doesnt mean we shouldnt require properties that improve UX (UX != DX).

btw. the point of using database migrations utilities like doctrine migrations (there are better migration tools) is that you dont need idempotency, because you version your db schema and persist the schema version inside the db itself, so the "should i run this part of the migration or not?" is answered by the stored version: a "single" number (actually a set of numbers) that reflects the whole db state in a sense.

loilo commented 6 years ago

btw. the point of using database migrations utilities like doctrine migrations (there are better migration tools) is that you dont need idempotency, because you version your db schema and persist the schema version inside the db itself

Right, but the resulting execution run by the migration framework still is idempotent, that's the framework's job.

That's basically my question boiled donw: How is it qualitatively different whether idempotency is enforced through a runonce script itself or through the underlying migration system?

backbone87 commented 6 years ago

probably no difference, but this is just a naive guess and i wouldnt count on that. but why does it matter? its 1 point on my list, that, when i did put it on, wasnt sure (and still arent), if it is a property fulfilled by doctrine migrations.

aschempp commented 6 years ago

I agree that Doctrine Migrations has it's flaws and was not made for this. However, DM is idempotent in the sense that it tracks what has already run, which eliminates the issue of a developer correctly tracking what has already been done.

In regards to the database field updates, I never had any issues with that by just using preUp, up and postUp. Running out of memory is not really a concern for me, because that will happen in whatever implementation one would use.

loilo commented 6 years ago

@backbone87 It matters because to discuss the point I needed to understand it first, that's why I wanted to make sure I'm getting your argument right.

I haven't commented on your other points because I don't disagree or need clarification on them. Using Doctrine Migrations has neither been part of my proposal nor do I have the experience to defend or disqualify it as a tool.

leofeyer commented 6 years ago

As discussed in Mumble on September 27th, we want to add a migration interface and a command to run the migrations. This is not to be mistaken with Doctrine migrations.

interface ContaoMigration
{
    public function shouldBeRun(): bool {}

    public function getDescription(): string {}

    public function run(): void {}

    public function getRunAfter(): array {}
}
loilo commented 6 years ago

Sounds nice. Some questions though:

backbone87 commented 6 years ago

Just a few more questions, for specification:

backbone87 commented 6 years ago

And some more:

Toflar commented 6 years ago

Looks like you guys have thought about this topic already, why don‘t you work on something? That would be great!

m-vo commented 6 years ago

According to our discussion on Mumble (no definitive answers):

I'd be curious what getRunAfter() should return/when it would be executed?

The classes this migration should run after.

So if I'm getting this right, no mechanism for downgrading / removal is planned?

Yes. Downgrade is hard to impossible to achieve if packages get removed the way they are removed now (composer). Earlier versions would have to know how to deal with future changes.

Are Migrations Services? (aka do i can inject services into them) Can there be multiple Migrations per Module/Bundle?

Yes. Just use multiple classes with that interface.

Do Migrations run before or after the DCA-driven DB-schema update? Maybe an option to run before and after?

The contao install tool updates would use the same mechanism. So you could e.g. specify if yours should run after them.

Are there other preconditions except shouldBeRun? Like Contao tracking already executed migrations?

No.

Is the description displayed to the user before the migration is executed?

This would allow a dry-run option, where shouldBeRun is executed and a list of those returning true + their descriptions is shown.

Is the user able to skip specific or all migrations? (My opinion: he should NOT be able to)

Yeah, I don't think so.

What if running migration A makes migration B's shouldBeRun true? Does the user have restart/reconfirm the migration process?

Yes. Alternatively we could execute the whole thing until all return false ;-).

What if this results in loops?

Probably we shouldn't do what I just said then. :-) This won't happen that often, I guess or do you have an example?

loilo commented 6 years ago

Yes. Downgrade is hard to impossible to achieve if packages get removed the way they are removed now (composer). Earlier versions would have to know how to deal with future changes.

Aw, snap. Well, I had thought out a way for that in the original proposal (opening post of this issue), I have to admit though that it's probably uncannily fragile.

backbone87 commented 6 years ago

I'd be curious what getRunAfter() should return/when it would be executed?

The classes this migration should run after.

What if, i have a generic MigrateFieldMigration, that is added as a service multiple times (but with different params)? Wouldnt it be better to return the service IDs insteadof class name? If you use autowiring and service defaults, this is then also the class name. (because services from service defaults become class name as service name). I guess the services are tagged with contao.migration anyway. Wouldnt it make more sense to pass the runAfter, via tag property? (similar to various priority props used with already existing container tags)

backbone87 commented 6 years ago

Yes. Alternatively we could execute the whole thing until all return false ;-).

But then you can not display the description beforehand, because you dont know the migrations the should be run after you executed a set of migrations.

Probably we shouldn't do what I just said then. :-)

A solution would be to remember all service IDs of the migrations already ran and only execute them once in a migrate step. But that still does not fix the description problem.

This won't happen that often, I guess or do you have an example?

In most cases it would be a dev mistake or out-of-scope usage, but imagine 2 different extension trying to modify a core field (slightly, e.g. changing collation or length). The shouldBeRun checks for current field length/collection matching the updated one.

m-vo commented 6 years ago

I guess the services are tagged with contao.migration anyway. Wouldnt it make more sense to pass the runAfter, via tag property? (similar to various priority props used with already existing container tags)

Hmm. Right now I think I'd rather not clutter the container with that and have a solution analogous to the manager plugins.

In most cases it would be a dev mistake or out-of-scope usage, but imagine 2 different extension trying to modify a core field (slightly, e.g. changing collation or length). The shouldBeRun checks for current field length/collection matching the updated one.

True, but then these two extensions would likely be incompatible anyways (or should enforce a more relaxed constraint like >= / <= instead of ==).

Alternatively we could execute the whole thing until all return false ;-).

But then you can not display the description beforehand, because you dont know the migrations the should be run after you executed a set of migrations.

Yeah, that's why it would probably be better to only run it once (or use a command flag if we want to run more than once, like -e or -e=4 or whatever).

Most importantly this IMHO should be a slim solution + interface that does not try to cover all migration cases (and is not overengineered) but provides a sufficient replacement for the runonce.php files. We can still improve the command later on and e.g. add options or allow more detailed config.

loilo commented 6 years ago

Most importantly this IMHO should be a slim solution + interface that does not try to cover all migration cases (and is not overengineered) but provides a sufficient replacement for the runonce.php files.

This would probably require the initial iteration to not offer shouldBeRun() at all, but again track executed migrations on Contao's side (which we wanted to avoid IIRC?).

Otherwise, we'd either have to think out shouldBeRun()'s behaviour from the start or introduce breaking changes later.

m-vo commented 6 years ago

I'm not saying we should not discuss the behaviour? I just want the whole thing to stay managable which means: small interfaces, few features to begin with. Small doesn't mean empty 😉.

Without shouldBeRun() an event would probably be suited better. This was discussed as well. But then e.g. everyone would need to replicate the logic / there could not be a dry run (btw. something that is unfortunately missing in the install tool). I don't think the application should track installs - there is way too much that can happen (from manual DB migration, installing - uinstalling - installing another version - ... + side effects from other extensions that operate on the same fields).

Plus: The core's migrations actually use a similar interface right now and could easily be adapted.

backbone87 commented 6 years ago

Hmm. Right now I think I'd rather not clutter the container with that and have a solution analogous to the manager plugins.

What do you mean with clutter? The tag should be there anyway? And the container is responsible for providing a collection of migrations (via tag). A single migration wouldnt know about other migrations, only the container.

ausi commented 5 years ago

As discussed at the developer meeting, we want to use tagged services (contao.migration) with the following interface:

namespace Contao\CoreBundle\Migration;

interface MigrationInterface
{
    public function getName(): string {}
    public function shouldRun(): bool {}
    public function run(): void {}
}

Instead of getRunAfter we want to use priorities to manage the order of migrations.

The migrations can be executed with a contao:migrate command which runs in three steps:

  1. Display all migrations that shouldRun() and execute them after confirmation (yes/no).
  2. Show the schema diff and execute it after confirmation (yes/complete/no) where yes doesn’t delete columns and tables.
  3. Display all new migrations same as step 1.

An AbstractContaoMigration class can default getDescription() to the name of the class like “installation-bundle Version480Update”.

The install tool should work the same as the command. Once the install-tool gets removed the manager can use the contao:migrate command.

loilo commented 4 years ago

So, since contao/contao#709 has landed, we can close this one I guess. 🚀