dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.39k stars 546 forks source link

pgloader repeatedly disconnecting from db during migration attempt #1537

Open gdelia-pm opened 12 months ago

gdelia-pm commented 12 months ago

I am attempting to use pgloader to migrate a mysql database to postgres but having a heap of trouble doing it. Regardless of how I approach it I'm pretty consistently seeing what I assume is a bug that causes pgloader to crash out.

What I've Done

  1. created a simple azure hosted postgres database and stopped running connections to my working mysql instance.
  2. created a .load migration file that contains the parameters a vendor suggested we use for the migration. I've adjusted this a few times to account for changes I've seen other suggest but regardless I see the same error each time
    
    LOAD DATABASE
    FROM    mysql://mmdbdevuser:password@127.0.0.1:3306/mattermost
    INTO    pgsql://mmdbdevuser:password@hostname:5432/mattermost

WITH data only, workers = 8, concurrency = 1, multiple readers per thread, rows per range = 50000, create no tables, create no indexes, preserve index names

SET PostgreSQL PARAMETERS maintenance_work_mem to '512MB', work_mem to '48MB'

SET MySQL PARAMETERS net_read_timeout = '1500', net_write_timeout = '1500'

CAST column Channels.Type to "channel_type" drop typemod, column Teams.Type to "team_type" drop typemod, column UploadSessions.Type to "upload_session_type" drop typemod, column Drafts.Priority to text, type int when (= precision 11) to integer drop typemod, type bigint when (= precision 20) to bigint drop typemod, type text to varchar drop typemod, type tinyint when (<= precision 4) to boolean using tinyint-to-boolean, type json to jsonb drop typemod

EXCLUDING TABLE NAMES MATCHING ~, ~

BEFORE LOAD DO $$ ALTER SCHEMA public RENAME TO mattermost; $$, $$ DROP INDEX IF EXISTS idx_posts_message_txt; $$, $$ DROP INDEX IF EXISTS idx_fileinfo_content_txt; $$

AFTER LOAD DO $$ UPDATE mattermost.db_migrations set name='add_createat_to_teamembers' where version=92; $$, $$ CREATE INDEX IF NOT EXISTS idx_posts_message_txt ON mattermost.posts USING gin(to_tsvector('english', message)); $$, $$ CREATE INDEX IF NOT EXISTS idx_fileinfo_content_txt ON mattermost.fileinfo USING gin(to_tsvector('english', content)); $$, $$ ALTER SCHEMA mattermost RENAME TO public; $$, $$ SELECT pg_catalog.set_config('search_path', '"$user", public', false); $$, $$ ALTER USER SET SEARCH_PATH TO 'public'; $$;

  3. I create the postgres database and set the schema --I've tried skipping this step to and get the same error, I'm including it here to clarify that the psql database seems to work fine when I interact with it without using psqL
  4. I start the pgloader command and it crashes with the below error

2023-10-25T21:10:35.259000Z ERROR A thread failed with error: Database error: Connection to database server lost. 2023-10-25T21:10:35.624000Z ERROR Date/time: 2023-10-25-21:10! An unhandled error condition has been signalled: Database error: Connection to database server lost.

Backtrace for: #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> 0: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SB-IMPL::CHARACTER-STRING-OSTREAM {1017674163}>) 1: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}> :OUTPUT NIL :IF-EXISTS :APPEND :VERBOSE NIL) 2: ((LAMBDA (CONDITION) :IN PGLOADER.LOAD:COPY-FROM) #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>) 3: (LPARALLEL.KERNEL::CONDITION-HANDLER #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>) 4: (SB-KERNEL::%SIGNAL #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>) 5: (ERROR #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>) 6: (SB-KERNEL:WITH-SIMPLE-CONDITION-RESTARTS ERROR NIL CL-POSTGRES:DATABASE-CONNECTION-LOST :MESSAGE "Connection to database server lost.") 7: (CL-POSTGRES::ENSURE-CONNECTION #<CL-POSTGRES:DATABASE-CONNECTION {1008D8EE73}> 0) 8: (CL-POSTGRES:EXEC-QUERY #<CL-POSTGRES:DATABASE-CONNECTION {1008D8EE73}> "ROLLBACK" CL-POSTGRES:IGNORE-ROW-READER) 9: (PGLOADER.PGCOPY::STREAM-ROWS-TO-COPY #1=#S(PGLOADER.CATALOG:TABLE :SOURCE-NAME #2="posts" :NAME "posts" :SCHEMA #3=#S(PGLOADER.CATALOG:SCHEMA :SOURCE-NAME #4="mattermost" :NAME "mattermost" :CATALOG #S(PGLOADER.CATALOG:CATALOG ........................ 10: (PGLOADER.PGCOPY:COPY-ROWS-FROM-QUEUE #<PGLOADER.SOURCE.MYSQL:COPY-MYSQL {100E181A13}> #S(LPARALLEL.VECTOR-QUEUE:VECTOR-QUEUE :IMPL #S(LPARALLEL.VECTOR-QUEUE::RAW-QUEUE :DATA #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ................................ 11: ((FLET "BODY-FN0" :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK)) 12: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK)) 13: (LPARALLEL.KERNEL::EXEC-TASK/WORKER # #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" (free)> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1007B276D3}>) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> :RUNNING-CATEGORY :DEFAULT :INDEX 2 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD (LPARALLEL.SPIN-QUEUE::DUMMY (#<CLOSURE #2=(LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK) {100E184ECB}> . :DEFAULT) (#<CLOSURE #2# {100E18783B}> . :DEFAULT) (#<CLOSURE #2# {100F180EBB}> . :DEFAULT) (#<CLOSURE #2# {100F18366B}> . :DEFAULT) (#<CLOSURE #2# {100F185E7B}> . :DEFAULT) (#<CLOSURE #2# {100FBB067B}> . :DEFAULT) (#<CLOSURE #2# {100FBB2FFB}> . :DEFAULT) (#<CLOSURE #2# {100FBB583B}> . :DEFAULT) (#<CLOSURE #2# {100FEDFF9B}> . :DEFAULT) (#<CLOSURE #2# {101089989B}> . :DEFAULT) (#<CLOSURE #2# {101089C0CB}> . :DEFAULT) (#<CLOSURE #2# {101089E8FB}> . :DEFAULT) (#<CLOSURE #2# {10119B904B}> . :DEFAULT) (#<CLOSURE #2# {10119BB8AB}> . :DEFAULT) . #3=((#<CLOSURE #2# {10119BE02B}> . :DEFAULT))) :TAIL #3#))) 14: (LPARALLEL.KERNEL::WORKER-LOOP #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 8 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {10090FABC3}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" (free)> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1007B276D3}>) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> :RUNNING-CATEGORY :DEFAULT :INDEX 2 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD (LPARALLEL.SPIN-QUEUE::DUMMY (#<CLOSURE #2=(LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK) {100E184ECB}> . :DEFAULT) (#<CLOSURE #2# {100E18783B}> . :DEFAULT) (#<CLOSURE #2# {100F180EBB}> . :DEFAULT) (#<CLOSURE #2# {100F18366B}> . :DEFAULT) (#<CLOSURE #2# {100F185E7B}> . :DEFAULT) (#<CLOSURE #2# {100FBB067B}> . :DEFAULT) (#<CLOSURE #2# {100FBB2FFB}> . :DEFAULT) (#<CLOSURE #2# {100FBB583B}> . :DEFAULT) (#<CLOSURE #2# {100FEDFF9B}> . :DEFAULT) (#<CLOSURE #2# {101089989B}> . :DEFAULT) (#<CLOSURE #2# {101089C0CB}> . :DEFAULT) (#<CLOSURE #2# {101089E8FB}> . :DEFAULT) (#<CLOSURE #2# {10119B904B}> . :DEFAULT) (#<CLOSURE #2# {10119BB8AB}> . :DEFAULT) . #3=((#<CLOSURE #2# {10119BE02B}> . :DEFAULT))) :TAIL #3#))) 15: (LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER #) 16: ((LAMBDA NIL :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT)) 17: (LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT #<CLOSURE (LAMBDA NIL :IN LPARALLEL.KERNEL::ENTER-WORKER-LOOP) {1007B276BB}> # #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 8 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {10090FABC3}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" (free)> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1007B276D3}>) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> :RUNNING-CATEGORY :DEFAULT :INDEX 2 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD (LPARALLEL.SPIN-QUEUE::DUMMY (#<CLOSURE #2=(LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK) {100E184ECB}> . :DEFAULT) (#<CLOSURE #2# {100E18783B}> . :DEFAULT) (#<CLOSURE #2# {100F180EBB}> . :DEFAULT) (#<CLOSURE #2# {100F18366B}> . :DEFAULT) (#<CLOSURE #2# {100F185E7B}> . :DEFAULT) (#<CLOSURE #2# {100FBB067B}> . :DEFAULT) (#<CLOSURE #2# {100FBB2FFB}> . :DEFAULT) (#<CLOSURE #2# {100FBB583B}> . :DEFAULT) (#<CLOSURE #2# {100FEDFF9B}> . :DEFAULT) (#<CLOSURE #2# {101089989B}> . :DEFAULT) (#<CLOSURE #2# {101089C0CB}> . :DEFAULT) (#<CLOSURE #2# {101089E8FB}> . :DEFAULT) (#<CLOSURE #2# {10119B904B}> . :DEFAULT) (#<CLOSURE #2# {10119BB8AB}> . :DEFAULT) . #3=((#<CLOSURE #2# {10119BE02B}> . :DEFAULT))) :TAIL #3#))) 18: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD)) 19: ((LABELS BORDEAUX-THREADS::%BINDING-DEFAULT-SPECIALS-WRAPPER :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS)) 20: ((FLET SB-UNIX::BODY :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE)) 21: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE)) 22: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE)) 23: ((FLET "WITHOUT-INTERRUPTS-BODY-1" :IN SB-THREAD::CALL-WITH-MUTEX)) 24: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE) {7F833B95ED7B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}>> NIL T NIL) 25: (SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> NIL #<CLOSURE (LABELS BORDEAUX-THREADS::%BINDING-DEFAULT-SPECIALS-WRAPPER :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS) {10090FBAFB}> NIL) 26: ("foreign function: call_into_lisp") 27: ("foreign function: new_thread_trampoline")

  6. I'm unsure how to proceed from here once it happens.  I don't see the postgres db actually load any data, when this happens, and I don't see enough info in either mysql or psql that tells me which side's connection has crashed or why.  Both database servers remain up the whole time and are accessible and queryable when this occurs.  

  - [x] pgloader --version
   pgloader version "3.6.999791d" 
   compiled with SBCL 2.0.1.debian```
gdelia-pm commented 12 months ago

Something worth noting is that if I use the INCLUDING ONLY TABLES parameters I can move a singe table at a time, but if I remove that and drop workers to 1 the migration seems to hang, and at 2 the migration behaves as described above.

Akshaychdev commented 12 months ago

I'm Getting the same error for Mattermost Database migration, My setup is as follows,

You can find the details here: https://forum.mattermost.com/t/migration-from-mysql-8-to-postgres-database/17166

I've 2 docker containers one for MySQL 8 and for Psql15, running on my VM (2 core, 4 GB), I've tried tweaking the WITH configurations like

WITH data only,
     workers = 4, concurrency = 1,
     multiple readers per thread, rows per range = 5000,
     create no tables, create no indexes,
     batch size = 50MB, prefetch rows = 500,
     preserve index names

And the pgloader command to.

pgloader --dynamic-space-size 1564 migration.load > migration.log

I have the original MySQL database sizes about 6 GB (MySQL data-directory size)…

But after running some time the pgloader CPU usage reaches very high and it stops with error…

Database error: Connection to database server lost.

It is happening for me after 2GB of data transferred to postgress data directory, I’ve tried tuning the values in the WITH, reducing --dynamic-space-size also even tried increasing the server capacity,

But getting the connection lost error when reaching the 1.8 GB - 2 GB mark.

gdelia-pm commented 11 months ago

@Akshaychdev Let me know if you've had any luck. I found a way around this (no thanks to my weeks long case with MM support) for my instance and I'm happy to share to see if it will work for more folks.

I found that only the posts table was causing crashes, so I did a 2 part move and some data transforms after the fact.

  1. Create a .load file for the posts table migration and run the migration against your instance. This should work without crashing

` LOAD DATABASE FROM mysql://user:PASSWORD@localhost:3306/mattermost INTO pgsql://user:PASSWORd@localhost:5432/mattermost

                WITH
                  workers = 2,
                  concurrency = 1,
                  preserve index names

                SET PostgreSQL PARAMETERS
                  maintenance_work_mem to '512MB',
                  work_mem to '48MB'

                CAST type int when (= precision 11) to integer drop typemod,
                     type bigint when (= precision 20) to bigint drop typemod,
                     type text to varchar drop typemod,
                     type json to jsonb drop typemod

                SET MySQL PARAMETERS
                  net_read_timeout  = '1500',
                  net_write_timeout = '1500'

                INCLUDING ONLY TABLE NAMES MATCHING 'posts'

                BEFORE LOAD DO
                  $$ ALTER SCHEMA public RENAME TO mattermost; $$,
                  $$ ALTER TABLE public.posts ALTER COLUMN message   TYPE character varying(65535) COLLATE pg_catalog."default"; $$,
                  $$ ALTER TABLE public.posts ALTER COLUMN hashtags  TYPE character varying(1000)  COLLATE pg_catalog."default"; $$,
                  $$ ALTER TABLE public.posts ALTER COLUMN filenames TYPE character varying(4000)  COLLATE pg_catalog."default"; $$,
                  $$ ALTER TABLE public.posts ALTER COLUMN fileids   TYPE character varying(3000)  COLLATE pg_catalog."default"; $$

                AFTER LOAD DO
                        $$ ALTER SCHEMA mattermost RENAME TO public; $$

; `

  1. Then update the postgres tables constraint to match the what the morph command sets. I also took some time to compare the schema set by the above to the schema the morph command sets to make sure they match. You can do this and make adjustments, but it will probably work either way ALTER TABLE public.posts RENAME CONSTRAINT random name TO posts_pkey;
  2. Use your previous pgloader command to move all other tables besides posts (either with an include only tables or exclude tables command'
  3. Update your matttermost config driver/dsn to the new database
  4. profit?
isacikgoz commented 6 months ago

We have another user reported the same issue, where the connection got lost during the migration process. I wonder if this is a false-positive output as the connection should've been restored with https://github.com/dimitri/pgloader/commit/cea82a6aa8afdd32bac7ee19774549a4f8e95b7e cc @dimitri