fly-apps / safe-ecto-migrations

Guide to Safe Ecto Migrations
Other
301 stars 12 forks source link

Use pg_advisory_lock for safe concurrent indexes #3

Closed dbern-stripe closed 2 years ago

dbern-stripe commented 2 years ago

Still needs some testing. @lowski if you end up testing this solution using pg_advisory_lock, please let me know. I'll also try to test it so no worries if you don't get to it.

It's incomplete at the moment-- I don't think it truly keeps multiple nodes from running all migrations at the same time, but the solution below should at least prevent that particular migration being run at the same time on multiple nodes.

Related #2

lcmen commented 2 years ago

@dbernheisel I've tried to test it but it doesn't seem to be working for me. Unfortunately, I don't have production app (as I'm still learning Elixir) so I tested it on my playground app here: https://github.com/lowski/notifications/pull/1.

After running migrate with --log-migrations-sql and --log-migrator-sql it seems to unlock before the actual migration is ran. Here is the output:

$ mix ecto.migrate --log-migrations-sql --log-migrator-sql

20:48:04.431 [debug] QUERY OK db=0.3ms idle=4.9ms
begin []

20:48:04.435 [debug] QUERY OK db=0.3ms
LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE []

20:48:04.444 [debug] QUERY OK db=0.3ms
commit []

20:48:04.471 [info]  == Running 20211208193432 Notifications.Repo.Migrations.AddIndexOnCreatedAtToNotifications.up/0 forward

20:48:04.474 [debug] QUERY OK db=0.2ms queue=0.4ms idle=28.5ms
SELECT pg_advisory_lock(123) []

20:48:04.475 [debug] QUERY OK db=0.3ms queue=0.3ms idle=8.8ms
SELECT pg_advisory_unlock(123) []

20:48:04.475 [info]  create index notifications_inserted_at_index

20:48:04.485 [debug] QUERY OK db=9.8ms queue=0.2ms idle=1.1ms
CREATE INDEX CONCURRENTLY "notifications_inserted_at_index" ON "notifications" ("inserted_at") []

20:48:04.487 [info]  == Migrated 20211208193432 in 0.0s

20:48:04.515 [debug] QUERY OK db=2.2ms queue=0.4ms idle=35.8ms
INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2) [20211208193432, ~N[2021-12-08 19:48:04]]

I'm also getting a following warning on Postgres logs:

2021-12-08 19:48:04.474 UTC [11186] WARNING:  you don't own a lock of type ExclusiveLock

I will need to dig more into this.

EDIT: I see you've changed single change to up and down functions but it seems to be the same (lock -> unlock -> migration -> schema insert).

PS. I also tried to wrap the whole try do / after within repo().checkout (to make sure the same connection is used) but I'm still getting: "you don't own a lock of type ExclusiveLock" warning.

dbernheisel commented 2 years ago

@lowski I've updated the example to use execute which is intentional about run order. This should work.

lcmen commented 2 years ago

Thank you so much! I combined this new approach with custom migration that you initially presented and looks very nice: https://github.com/lowski/notifications/pull/1/files !

I believe there is still one caveat: insert into schema_migrations table happens outside of advisory lock:

$ mix ecto.migrate --log-migrations-sql --log-migrator-sql

21:24:44.299 [debug] QUERY OK db=0.3ms idle=5.0ms
begin []

21:24:44.302 [debug] QUERY OK db=0.2ms
LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE []

21:24:44.311 [debug] QUERY OK db=0.2ms
commit []

21:24:44.337 [info]  == Running 20211208193432 Notifications.Repo.Migrations.AddIndexOnCreatedAtToNotifications.up/0 forward

21:24:44.340 [info]  execute "SELECT pg_try_advisory_lock(45843237)"

21:24:44.341 [debug] QUERY OK db=0.2ms queue=0.6ms idle=28.9ms
SELECT pg_try_advisory_lock(45843237) []

21:24:44.341 [info]  create index notifications_inserted_at_index

21:24:44.352 [debug] QUERY OK db=10.9ms queue=0.2ms idle=6.2ms
CREATE INDEX CONCURRENTLY "notifications_inserted_at_index" ON "notifications" ("inserted_at") []

21:24:44.352 [info]  execute "SELECT pg_advisory_unlock(45843237)"

21:24:44.353 [debug] QUERY OK db=0.2ms queue=0.4ms idle=11.4ms
SELECT pg_advisory_unlock(45843237) []

21:24:44.354 [info]  == Migrated 20211208193432 in 0.0s

21:24:44.393 [debug] QUERY OK db=3.2ms queue=0.6ms idle=35.3ms
INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2) [20211208193432, ~N[2021-12-08 20:24:44]]

This can lead to a following race condition in multi-node setup:

  1. Node 1 starts migration, and executes it
  2. Before inserting a record into schema_migrations by Node 1, Node 2 tries to run migration
  3. Node 1 inserts a record into schema_migrations

In this particular scenario we might end up with running migration on the database twice.

dbernheisel commented 2 years ago

yeah, this will need some more thought and work to be safe. Thanks for taking a look.

lcmen commented 2 years ago

@dbernheisel thank you so much for looking into this!

In the meantime I've checked the source code of ecto-sql, and I don't think this issue can be fixed without modifying Migrator code. When we disable ddl transaction and migration lock, our up / down functions are executed before schema migration is applied.

Maybe having an option to specify a lock type for a migration (e.g. `@migration_lock_type) could be a solution?

dbernheisel commented 2 years ago

closing in favor of migration_lock: :pg_advisory_lock coming in ecto_sql soon.