golang-migrate / migrate

Database migrations. CLI and Golang library.
Other
15.11k stars 1.39k forks source link

Explicit transactions not rolled back in case of error (postgres/pgx drivers) #581

Open cochran-at-niche opened 3 years ago

cochran-at-niche commented 3 years ago

Describe the Bug

If an error happens within an explicit transaction (i.e. between BEGIN/COMMIT statements) in a postgres migration script, the transaction is not rolled back, and is left in an aborted state, which causes further statements executed on that connection to fail, including the statement that is supposed to release the advisory lock.

Explanation

The tutorial for the postgres driver recommends using explicit transactions in migrations scripts (i.e. using explicit BEGIN/COMMIT commands):

In Postgres, when we want our queries to be done in a transaction, we need to wrap it with BEGIN and COMMIT commands.

However, it's worth noting that this is not actually necessary to ensure the migration script runs in a single transaction, because golang-migrate uses a simple Exec call with no arguments to execute the migration script (this is true for both the postgres and pgx golang-migrate drivers), and the underlying database driver uses the simple query protocol when executing a statement with no arguments (again, this is true for both lib/pq and jackc/pgx). The documentation for the simple query protocol (specifically, the 52.2.2.1. Multiple Statements in a Simple Query section) says:

When a simple Query message contains more than one SQL statement (separated by semicolons), those statements are executed as a single transaction, unless explicit transaction control commands are included to force a different behavior.

So migration scripts are executed within a single implicit transaction by default (the tutorial is therefore somewhat misleading), and explicit BEGIN/COMMIT statements are really only necessary if you want to override that behavior (for example, if you'd rather execute a migration in multiple separate transactions to avoid holding locks for too long). However, the problem is that explicit transactions (started with a BEGIN command) are not automatically rolled back when there's an error, unlike implicit transactions:

The main difference between an implicit transaction block and a regular one is that an implicit block is closed automatically at the end of the Query message, either by an implicit commit if there was no error, or an implicit rollback if there was an error.

An explicit transaction actually requires an explicit ROLLBACK in case of an error:

Conversely, if a BEGIN appears in a multi-statement Query message, then it starts a regular transaction block that will only be terminated by an explicit COMMIT or ROLLBACK, whether that appears in this Query message or a later one.

If there is an error and a ROLLBACK is not given, the transaction is left in an aborted state, and all further commands in the session will fail with an error. However, neither the postgres nor the pgx golang-migrate drivers currently issue an explicit ROLLBACK in case of an error (see, for example, here). Instead, the transaction remains open, in an aborted state, and all further commands fail. In particular, the statement that is supposed to release the advisory lock, which runs after a migration script finishes, fails.

Thankfully, the advisory lock documentation makes it clear that when a session ends, any outstanding advisory locks are automatically released:

Once acquired at session level, an advisory lock is held until explicitly released or the session ends

So although the transaction is left in an aborted state, and the advisory lock is not released via the statement that's supposed to release it, when the connection is eventually closed and the session ends, the transaction is automatically rolled back, and the advisory lock is automatically released. Many people are probably relying on that exact behavior if/when their migration scripts fail with an error inside of an explicit transaction block.

Still, I think this is a bug, particularly because someone might have created their migrate instance with NewWithInstance or NewWithDatabaseInstance, in which case they may intend to keep the connection open, and potentially continue to use it, after the migration fails. However, they would be unable to do so without first issuing a ROLLBACK command, due to the aborted transaction state, and even if they did, the advisory lock would not be released until the end of their session unless they issued an explicit SELECT pg_advisory_unlock() command with the correct lock name (which seems like an implementation detail of golang-migrate that should probably remain encapsulated within this library). The advisory lock is not automatically released when the transaction is rolled back, because it is a session-level lock:

Unlike standard lock requests, session-level advisory lock requests do not honor transaction semantics: a lock acquired during a transaction that is later rolled back will still be held following the rollback

Steps to Reproduce

  1. Create an empty postgres test database (I'm using postgres version 12.2, but I believe this affects all recent versions).
  2. Create a migration script named 01_test.up.sql containing the following, which is intended to trigger an error inside of an explicit transaction block (meaning the COMMIT will never run, and the transaction will be left in an aborted state):
BEGIN;
SELECT 1/0; -- Divide by zero error!
COMMIT;
  1. Run the migration script with the migrate CLI tool using either the postgres or pgx driver, like so (substituting in the correct values for $PATH and $DATABASE):
migrate -path=$PATH -database=$DATABASE up
  1. Observe the following error output, which contains two distinct errors. In particular, note the second error (highlighted in bold), which explains that the transaction was left in an aborted state, and the statement that's supposed to release the advisory lock was therefore ignored:

error: 2 errors occurred:

  • migration failed: division by zero in line 0: BEGIN; SELECT 1/0; COMMIT; (details: pq: division by zero)
  • pq: current transaction is aborted, commands ignored until end of transaction block in line 0: SELECT pg_advisory_unlock($1)

Expected Behavior

I expected a single divide-by-zero error in the output, but no error about the transaction being in an aborted state or an ignored SELECT pg_advisory_unlock($1) statement. I expected the failed transaction to be rolled back, and the advisory lock to be released. I expected the session to be left in a usable state (not an open, aborted transaction).

Migrate Version

dev

Loaded Source Drivers

file

Loaded Database Drivers

postgresql, stub, postgres

Go Version

go version go1.16.3 linux/amd64

Stacktrace

N/A

Additional context

I think this could easily be solved by issuing an explicit ROLLBACK command when a migration script fails with an error, in both the postgres and pgx golang-migrate drivers (I cannot speak to whether this sort of thing impacts other databases/drivers). The explicit rollback command is necessary if the migration script failed within an explicit transaction block (as in the example above), but would be a no-op if the migration script failed within an implicit transaction (because implicit transactions are automatically rolled back in case of an error, and further ROLLBACK statements therefore do nothing).

I also think it might be worth updating the postgres/pgx driver tutorials/documentation to clarify the fact that multi-statement migration scripts are automatically run inside of an implicit transaction, and that explicit transaction control statements (i.e. BEGIN/COMMIT) are only really necessary if you want to override that behavior (e.g. to run your migration in multiple separate transactions to avoid holding locks for too long, etc.).

bclarkx2 commented 1 year ago

@nathanjcochran I've put together a fix for this in https://github.com/nicheinc/migrate/pull/1 in case you happen to be still interested :sweat_smile:

But of course, :no-pressure: :P