fly-apps / safe-ecto-migrations

Guide to Safe Ecto Migrations
Other
301 stars 12 forks source link

Docs on setting NON NULL safely on PG12+ may not be safe due to ecto-generated inclusion of `ALTER COLUMN ... TYPE ...` #10

Closed dhedlund closed 10 months ago

dhedlund commented 12 months ago

For the recipe to set NON NULL on an existing column, I think there's a bug with one of the statements that will cause locking for extended periods of time in PG12. I have no tried to reproduce on any newer versions of postgres.

When it gets to the following step in the docs:

  alter table("products") do
    modify :active, :boolean, null: false
  end

Ecto generates the following SQL under the hood:

ALTER TABLE "products" ALTER COLUMN "active" TYPE boolean, ALTER COLUMN "active" SET NOT NULL;

Even though the type is the same as the existing column type, which should in itself be a no-op or metadata-only change, I believe it's causing postgres to not think that it can optimize the ALTER; it sees that additional work might be needed to be done at the same time and decides not to take the constraint-check shortcut route, thus triggering the table scan again while locked. At least that was my experience on PG 12.14.

Until ecto is smart enough to know that the column type is the same, and strips out the ALTER COLUMN ... TYPE ..., it might be dangerous to run an ecto-generated ALTER TABLE to perform this step vs. handcrafted SQL. Even then, you'd have to be on a certain version of ecto.

Steps to Reproduce

Assuming a table called "foo" with a lot of data and a column called "a" of type text with a UNIQUE constraint:

(I'm using 50 million rows, but locally on a fast NVMe drive w/ enough RAM to fully cache the data set, so single-digit second responses will be larger on an active production environment)

I'm running PG 12.2 for these timings, but we also experienced the issue on PG 12.14.

Creating and validating the constraint. Note the 3.3 seconds to do a full table scan:

testdb =# ALTER TABLE "foo" ADD CONSTRAINT "a_not_null" CHECK (a IS NOT NULL) NOT VALID;
ALTER TABLE
Time: 6.648 ms

testdb=# ALTER TABLE foo VALIDATE CONSTRAINT a_not_null;
ALTER TABLE
Time: 3365.122 ms (00:03.365)

Converting the column to non-null using SQL produced by mix ecto.migrate --log-migrations-sql. Note the inclusion of ALTER COLUMN "a" TYPE text and the time it takes to run being similar to the full table scan:

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" TYPE text, ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 3425.920 ms (00:03.426)

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.708 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" TYPE text, ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 3370.575 ms (00:03.371)

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.634 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" TYPE text, ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 3418.631 ms (00:03.419)

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.586 ms

Setting the NOT NULL without the type. Note response times in the single-digit milliseconds:

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 1.475 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.428 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 6.562 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.685 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" SET NOT NULL;
ALTER TABLE
Time: 6.757 ms

testdb =# ALTER TABLE "foo" ALTER COLUMN "a" DROP NOT NULL;
ALTER TABLE
Time: 6.445 ms

Example Table Setup

The following table setup was used to reproduce the issue:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE foo (
    id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    a text,
    b text,
    c text,
    d text,
    e text,
    f text,
    g text,
    h text,
    i text,
    j text,
    k text,
    l text,
    m text,
    n text,
    o text,
    p text,
    q text,
    r text,
    s text,
    t text,
    u text,
    v text,
    w text,
    x text,
    y text,
    z text,
    CONSTRAINT a_uniq_idx UNIQUE(a)
);

INSERT INTO foo (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z)
  SELECT
    uuid_generate_v4(),
    concat('b-', i::text),
    concat('c-', i::text),
    concat('d-', i::text),
    concat('e-', i::text),
    concat('f-', i::text),
    concat('g-', i::text),
    concat('h-', i::text),
    concat('i-', i::text),
    concat('j-', i::text),
    concat('k-', i::text),
    concat('l-', i::text),
    concat('m-', i::text),
    concat('n-', i::text),
    concat('o-', i::text),
    concat('p-', i::text),
    concat('q-', i::text),
    concat('r-', i::text),
    concat('s-', i::text),
    concat('t-', i::text),
    concat('u-', i::text),
    concat('v-', i::text),
    concat('w-', i::text),
    concat('x-', i::text),
    concat('y-', i::text),
    concat('z-', i::text)
  FROM generate_series(1, 5000000) AS gs(i);
dhedlund commented 12 months ago

Looks like someone added a note about this to the ecto_sql docs for Ecto.Migration.modify/3:

If you want to modify a column without changing its type, such as adding or dropping a null constraints, consider using the execute/2 command with the relevant SQL command instead of modify/3, if supported by your database. This may avoid redundant type updates and be more efficient, as an unnecessary type update can lock the table, even if the type actually doesn't change.

dhedlund commented 12 months ago

The following Google Group thread from 2020 has more info about why the ecto-sql team hasn't added support for modify/2: https://groups.google.com/g/elixir-ecto/c/rBwdS0bXl4U/m/r2Ohs5eEBgAJ