dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.41k stars 545 forks source link

issues with reloids in schema. Pgloader dies completely with kaboom #1503

Open jtorral opened 1 year ago

jtorral commented 1 year ago

Version Info

root@carbon1:/var/lib/postgresql# pgloader -V
pgloader version "3.6.7~devel"
compiled with SBCL 2.0.1.debian

Btw, this is confusing. Base don version you see above. but ...

root@carbon1:/tmp/pgloader#  apt-cache madison pgloader
  pgloader | 3.6.9-1.pgdg20.04+1 | http://apt.postgresql.org/pub/repos/apt focal-pgdg/main amd64 Packages
  pgloader |    3.6.1-1 | http://azure.archive.ubuntu.com/ubuntu focal/universe amd64 Packages

The following errors cause pgloader to just fail

root@carbon1:/var/lib/postgresql# pgloader  pg_load
2023-06-21T17:11:20.014000Z LOG pgloader version "3.6.7~devel"
2023-06-21T17:11:20.091000Z LOG Migrating from #<MYSQL-CONNECTION mysql://root@localhost:3306/regen {10062EFE83}>
2023-06-21T17:11:20.091000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://carbon@localhost:5432/regen {10062F11A3}>
KABOOM!
UNDEFINED-COLUMN: Database error 42703: column ""unique ID"" of relation "random_number_test" does not exist
CONTEXT: PL/pgSQL function inline_code_block line 6 at FOR over SELECT rows
QUERY:
DO $$
DECLARE
  n integer := 0;
  r record;
BEGIN
  FOR r in
       SELECT 'select '
               || trim(trailing ')'
                  from replace(pg_get_expr(d.adbin, d.adrelid),
                               'nextval', 'setval'))
               || ', (select greatest(max(' || quote_ident(a.attname) || '), (select seqmin from pg_sequence where seqrelid = ('''
               || pg_get_serial_sequence(quote_ident(nspname) || '.' || quote_ident(relname), quote_ident(a.attname)) || ''')::regclass limit 1), 1) from only '
               || quote_ident(nspname) || '.' || quote_ident(relname) || '));' as sql
         FROM pg_class c
              JOIN pg_namespace n on n.oid = c.relnamespace
              JOIN pg_attribute a on a.attrelid = c.oid
              JOIN pg_attrdef d on d.adrelid = a.attrelid
                                 and d.adnum = a.attnum
                                 and a.atthasdef
        WHERE relkind = 'r' and a.attnum > 0
              and pg_get_expr(d.adbin, d.adrelid) ~ '^nextval'
              and c.oid in (select oid from reloids)
  LOOP
    n := n + 1;
    EXECUTE r.sql;
  END LOOP;

  PERFORM pg_notify('seqs', n::text);
END;
$$; 2023-06-21T17:11:21.610000Z ERROR Database error 42703: column ""unique ID"" of relation "random_number_test" does not exist
CONTEXT: PL/pgSQL function inline_code_block line 6 at FOR over SELECT rows
QUERY:
DO $$
DECLARE
  n integer := 0;
  r record;
BEGIN
  FOR r in
       SELECT 'select '
               || trim(trailing ')'
                  from replace(pg_get_expr(d.adbin, d.adrelid),
                               'nextval', 'setval'))
               || ', (select greatest(max(' || quote_ident(a.attname) || '), (select seqmin from pg_sequence where seqrelid = ('''
               || pg_get_serial_sequence(quote_ident(nspname) || '.' || quote_ident(relname), quote_ident(a.attname)) || ''')::regclass limit 1), 1) from only '
               || quote_ident(nspname) || '.' || quote_ident(relname) || '));' as sql
         FROM pg_class c
              JOIN pg_namespace n on n.oid = c.relnamespace
              JOIN pg_attribute a on a.attrelid = c.oid
              JOIN pg_attrdef d on d.adrelid = a.attrelid
                                 and d.adnum = a.attnum
                                 and a.atthasdef
        WHERE relkind = 'r' and a.attnum > 0
              and pg_get_expr(d.adbin, d.adrelid) ~ '^nextval'
              and c.oid in (select oid from reloids)
  LOOP
    n := n + 1;
    EXECUTE r.sql;
  END LOOP;

  PERFORM pg_notify('seqs', n::text);
END;
$$;

An unhandled error condition has been signalled:
   Database error 42703: column ""unique ID"" of relation "random_number_test" does not exist
CONTEXT: PL/pgSQL function inline_code_block line 6 at FOR over SELECT rows
QUERY:
DO $$
DECLARE
  n integer := 0;
  r record;
BEGIN
  FOR r in
       SELECT 'select '
               || trim(trailing ')'
                  from replace(pg_get_expr(d.adbin, d.adrelid),
                               'nextval', 'setval'))
               || ', (select greatest(max(' || quote_ident(a.attname) || '), (select seqmin from pg_sequence where seqrelid = ('''
               || pg_get_serial_sequence(quote_ident(nspname) || '.' || quote_ident(relname), quote_ident(a.attname)) || ''')::regclass limit 1), 1) from only '
               || quote_ident(nspname) || '.' || quote_ident(relname) || '));' as sql
         FROM pg_class c
              JOIN pg_namespace n on n.oid = c.relnamespace
              JOIN pg_attribute a on a.attrelid = c.oid
              JOIN pg_attrdef d on d.adrelid = a.attrelid
                                 and d.adnum = a.attnum
                                 and a.atthasdef
        WHERE relkind = 'r' and a.attnum > 0
              and pg_get_expr(d.adbin, d.adrelid) ~ '^nextval'
              and c.oid in (select oid from reloids)
  LOOP
    n := n + 1;
    EXECUTE r.sql;
  END LOOP;

  PERFORM pg_notify('seqs', n::text);
END;
$$;

What I am doing here?

Database error 42703: column ""unique ID"" of relation "random_number_test" does not exist
CONTEXT: PL/pgSQL function inline_code_block line 6 at FOR over SELECT rows
QUERY:
DO $$
DECLARE
  n integer := 0;
  r record;
BEGIN
  FOR r in
       SELECT 'select '
               || trim(trailing ')'
                  from replace(pg_get_expr(d.adbin, d.adrelid),
                               'nextval', 'setval'))
               || ', (select greatest(max(' || quote_ident(a.attname) || '), (select seqmin from pg_sequence where seqrelid = ('''
               || pg_get_serial_sequence(quote_ident(nspname) || '.' || quote_ident(relname), quote_ident(a.attname)) || ''')::regclass limit 1), 1) from only '
               || quote_ident(nspname) || '.' || quote_ident(relname) || '));' as sql
         FROM pg_class c
              JOIN pg_namespace n on n.oid = c.relnamespace
              JOIN pg_attribute a on a.attrelid = c.oid
              JOIN pg_attrdef d on d.adrelid = a.attrelid
                                 and d.adnum = a.attnum
                                 and a.atthasdef
        WHERE relkind = 'r' and a.attnum > 0
              and pg_get_expr(d.adbin, d.adrelid) ~ '^nextval'
              and c.oid in (select oid from reloids)
  LOOP
    n := n + 1;
    EXECUTE r.sql;
  END LOOP;

  PERFORM pg_notify('seqs', n::text);
END;
$$;

From the above message, it looks like ( select oid from reloids) is the issue.

All tables from mysql get brought over to postgres. So the initial message of

column ""unique ID"" of relation "random_number_test" does not exist

Is a little misleading. At least for me.

This is my load file

LOAD DATABASE
   FROM      mysql://*******
   INTO postgresql://******
   alter schema 'regen' rename to 'public'

   WITH
      -- include drop, create tables, create indexes, reset sequences, quote identifiers,
      include drop, create tables, create indexes, reset sequences,
      multiple readers per thread, rows per range = 50000

   -- SET PostgreSQL PARAMETERS
      -- maintenance_work_mem to '512MB',
      -- work_mem to '12MB'

   SET MySQL PARAMETERS
      net_read_timeout  = '31536000',
      net_write_timeout = '31536000',
      lock_wait_timeout = '31536000'

   CAST
      type date drop not null drop default using zero-dates-to-null,
      type datetime to timestamp drop default using zero-dates-to-null,
      type bigint when unsigned to numeric drop typemod,
      type bigint when (<= precision 20) to bigint drop typemod,
      type geography to bytea,
      type geography to point,
      type geometry  to point using convert-mysql-point,
      type point     to point using convert-mysql-point

   BEFORE LOAD DO
      $$ create extension if not exists postgis; $$
;
moson-mo commented 1 year ago

Same problem here when using quote identifiers.

The issue seems to be that there are double quotes applied twice for the column name:

column ""IDcolumn"" of relation "sometable" does not exist

Removing quote_ident for a.attname in the following line should fix it I guess:

|| pg_get_serial_sequence(quote_ident(nspname) || '.' || quote_ident(relname), quote_ident(a.attname)) || ''')::regclass limit 1), 1) from only '

So should be:

|| pg_get_serial_sequence(quote_ident(nspname) || '.' || quote_ident(relname), a.attname) || ''')::regclass limit 1), 1) from only '

edit Just noticed there is already a PR to fix this issue: https://github.com/dimitri/pgloader/pull/1509