botpress / messaging

Botpress messaging server
42 stars 37 forks source link

fix(migrations): allow to auto down migrate when there is no migration required #511

Closed laurentlp closed 2 years ago

laurentlp commented 2 years ago

This PR allows for a user to let's say, start Messaging using v.1.2.3 and then go back to v.1.2.0 without having to do a down migration first since there is no migration to run between these two versions.

samuelmasse commented 2 years ago

What's an auto down migration? When the server starts, the first thing it does is try to get a DB that has the corresponding schema to its server version. If the DB doesn't exists it creates one. If it finds that the DB version is behind, it can automatically upgrade it if it's given the permission to do so (with --auto-migrate). However if it finds that the DB version is ahead, then there is no automatic decision that the server can take besides refusing to start and displaying an error, because the outdated server can't downgrade the DB because it doesn't have the migrations to do so (that code exists in future versions of the server)

I'm suspecting that this PR tries to make this one liner work as a all around handler for migrations : bp start --auto-migrate. The thing is that this can't work unless migrations are guaranteed to be backward compatible (in that case any earlier server version could be started without first down migrating). So you really have 3 options :

Notice that the concept of automatic down migration does not exist in any of these cases. In the first option, down migrations aren't a thing at all because everything is backward compatible, so you would not ever need to down migrate. In the second option you don't "automatically down migrate", you just keep running the server with a DB version that is ahead (unless it's a major version difference). In the third option you just refuse to run the server if the DB version is ahead.

So the only way to make the auto migrate one liner work is with the first option. With all other options you'll have to run bp migrate down -t a.b.c on the current server version before downgrading it. I assume many long time botpress users might not even be aware that they need to run down migrations, and this could be causing problems with the messaging server refusing to start if they downgrade their botpress instance without running the down migration.

The botpress server will actually accept to run when the DB version is ahead, but it doesn't do so in a justified way, because the botpress server does not fit in the first option of "always backward-compatible migrations". It is stuck forever in the third option, due to the history of migrations that change DB/JSON schemas in between random versions. So for on-prem users, I would say the solution is to put a safeguard in the botpress migration system that prints out a message that down migrations have not been run and need to be run on the approriate server version. Then on-prem users need to be made aware that before downgrading their botpress server, they need to run the down migration command. For the cloud, I suggest implementing the second option on master.

laurentlp commented 2 years ago

What's an auto down migration? When the server starts, the first thing it does is try to get a DB that has the corresponding schema to its server version. If the DB doesn't exists it creates one. If it finds that the DB version is behind, it can automatically upgrade it if it's given the permission to do so (with --auto-migrate). However if it finds that the DB version is ahead, then there is no automatic decision that the server can take besides refusing to start and displaying an error, because the outdated server can't downgrade the DB because it doesn't have the migrations to do so (that code exists in future versions of the server)

I'm suspecting that this PR tries to make this one liner work as a all around handler for migrations : bp start --auto-migrate. The thing is that this can't work unless migrations are guaranteed to be backward compatible (in that case any earlier server version could be started without first down migrating). So you really have 3 options :

  • Adjust the system to accept running on versions of the DB that are ahead and make sure all migrations are backward compatible. With this approach you don't update the DB version in msg_meta to your version if it's ahead, you just continue running the server as normal. This would be a good option for the cloud as it allows having no downtime when migrating
  • Second options is the same as the first option but only accept running the server if the DB version is not more than a major version ahead. This is a nice in-between that allows you to have minimal downtime but also fix mistakes that would otherwise be breaking changes in the DB schema at every major version release (for some down time). In my opinion this is the optimal approach.
  • Third option is the current way it's implemented, which is to assume that any version change in the DB can be a breaking change. This requires downtime at every version change, since your previous versions of the server are assumed to be incompatible with future versions of the DB

Notice that the concept of automatic down migration does not exist in any of these cases. In the first option, down migrations aren't a thing at all because everything is backward compatible, so you would not ever need to down migrate. In the second option you don't "automatically down migrate", you just keep running the server with a DB version that is ahead (unless it's a major version difference). In the third option you just refuse to run the server if the DB version is ahead.

So the only way to make the auto migrate one liner work is with the first option. With all other options you'll have to run bp migrate down -t a.b.c on the current server version before downgrading it. I assume many long time botpress users might not even be aware that they need to run down migrations, and this could be causing problems with the messaging server refusing to start if they downgrade their botpress instance without running the down migration.

The botpress server will actually accept to run when the DB version is ahead, but it doesn't do so in a justified way, because the botpress server does not fit in the first option of "always backward-compatible migrations". It is stuck forever in the third option, due to the history of migrations that change DB/JSON schemas in between random versions. So for on-prem users, I would say the solution is to put a safeguard in the botpress migration system that prints out a message that down migrations have not been run and need to be run on the approriate server version. Then on-prem users need to be made aware that before downgrading their botpress server, they need to run the down migration command. For the cloud, I suggest implementing the second option on master.

@samuelmasse the issue I'm trying to fix here is the case where you downgrade from one version to another that doesn't have any database migration.

Let's just take the case of v1.2.0 and v.1.2.3. There is no migration between those two versions. If you were to start v.1.2.0 and then migrate up to v.1.2.3 (with or without the --auto-migrate command), it would set the server version to v.1.2.3 in the msg_meta table. Now, if you want to downgrade the server version back to v1.2.0 (let's say there is a bug in 1.2.3), you need to first run ./messaging migrate down --target 1.2.0 even though there are no down migration to run. The reason being that we check if the version in the msg_meta table is higher than the binary version when starting the server normally.

What I'm adding here is a check, if you don't provide a command (up/down), to see if the bin version is lower than the database version. If this is true (like in the example above) and there is no migration to down migrate from, we update the database version and let the server start. We do the exact same thing for the up migrations. If there is no migration to run and the bin version is higher than the db version, we simply update the entry in the msg_meta table. For the binary, running using a database schema from v.1.2.3 or v.1.2.0 is the same.

TL;DR; I simply re-used the same logic we have for the "auto" up migrate when the is no migration to run. If there are migrations to run when downgrading from one version to another, the server still will print an error and shut down the server!

samuelmasse commented 2 years ago

@laurentlp if the bin version is lower than the database version and there is no migration to down migrate from -> we update the database version and let the server start. But if the bin version is lower than the database version, the outdated binaries doesn't have the code for the down migrations of the more up to date DB. In your example it works because there are no migrations to run. What if there are migrations to run, how will the old binary know that?

laurentlp commented 2 years ago

@laurentlp if the bin version is lower than the database version and there is no migration to down migrate from -> we update the database version and let the server start. But if the bin version is lower than the database version, the outdated binaries doesn't have the code for the down migrations of the more up to date DB. In your example it works because there are no migrations to run. What if there are migrations to run, how will the old binary know that?

Right! In fact, it doesn't work at all! Good catch!!!