dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.45k stars 548 forks source link

Cannot understand migration error reasons #1530

Closed Anth0nyME closed 1 year ago

Anth0nyME commented 1 year ago

Hello!

After ~24h of migration (source DB size ~ 200Gb, not so quick RAID solution) i got error. I can't understand reasons of erorr, please assist (if possible) to understand it.

Thank you!

OS: Rocky Linux 8.8 (Green Obsidian) postgres (PostgreSQL) 15.4 (dnf, https://download.postgresql.org/pub/repos) MariaDB: 10.11.5-MariaDB

pgloader version "3.6.7~devel" (dnf) compiled with SBCL 2.2.10-1.rhel8

Goal of migration: transfer the Zabbix 6.0.21 LTS partitioned DB from MariaDB to PostgreSQL (without TimescaleDB ATM) via Unix sockets (single host enviroment). This is TEST server, no zabbix-server daemon is running.

DS size (on disk) on failure time. Source (mariadb): ~172Gb Destination (postgresql): ~210Gb

No

Yes but it too complicated without reasons understanding.

What i did:

sudo -u postgres pgloader zabbix_migrate.load

Content of "zabbix_migrate.load":

LOAD DATABASE
FROM mysql://user:password@unix:/var/lib/mysql/mysql.sock:/zabbixdb
INTO postgresql://unix:/tmp:5432/zabbixdb
WITH include no drop,
truncate,
create no tables,
create no indexes,
no foreign keys,
reset sequences,
data only
SET maintenance_work_mem TO '1024MB', work_mem to '256MB'
ALTER SCHEMA 'zabbixdb' RENAME TO 'public'
BEFORE LOAD EXECUTE create.sql
AFTER LOAD EXECUTE alter.sql;

Result (console):

. . .
:START 13223 :COUNT 100000) :LOCK #<SB-THREAD:MUTEX "Anonymous lock" free owner=0> :NOTIFY-PUSH #<SB-THREAD:WAITQUEUE #1=Anonymous condition variable {101D796B13}> :NOTIFY-POP #<SB-THREAD:WAITQUEUE #1# {100E4BB783}>) :DISABLE-TRIGGERS NIL :ON-ERROR-STOP T :COLUMNS NIL)
8: ((FLET "BODY-FN0" :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
9: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
10: (LPARALLEL.KERNEL::EXEC-TASK/WORKER #<unavailable argument> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
11: (LPARALLEL.KERNEL::WORKER-LOOP #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
12: (LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER #<unavailable argument>)
13: ((LAMBDA NIL :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT))
14: (LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT #<FUNCTION (LAMBDA NIL :IN LPARALLEL.KERNEL::ENTER-WORKER-LOOP) {1005EF6A5B}> #<FUNCTION FUNCALL> #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
15: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD))
16: ((LAMBDA NIL :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS))
17: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
18: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-THREAD::RUN))
19: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
20: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::RUN))
21: (SB-THREAD::RUN)
22: ("foreign function: call_into_lisp_")
23: ("foreign function: funcall1")

What I am doing here?

Database error: Connection to database server lost.

Result (pgloader.log):

 :START 13223 :COUNT 100000) :LOCK #<SB-THREAD:MUTEX "Anonymous lock" free owner=0> :NOTIFY-PUSH #<SB-THREAD:WAITQUEUE #1=Anonymous condition variable {101D796B13}> :NOTIFY-POP #<SB-THREAD:WAITQUEUE #1# {100E4BB783}>) :DISABLE-TRIGGERS NIL :ON-ERROR-STOP T :COLUMNS NIL)
8: ((FLET "BODY-FN0" :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
9: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
10: (LPARALLEL.KERNEL::EXEC-TASK/WORKER #<unavailable argument> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
11: (LPARALLEL.KERNEL::WORKER-LOOP #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
12: (LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER #<unavailable argument>)
13: ((LAMBDA NIL :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT))
14: (LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT #<FUNCTION (LAMBDA NIL :IN LPARALLEL.KERNEL::ENTER-WORKER-LOOP) {1005EF6A5B}> #<FUNCTION FUNCALL> #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
15: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD))
16: ((LAMBDA NIL :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS))
17: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
18: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-THREAD::RUN))
19: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
20: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::RUN))
21: (SB-THREAD::RUN)
22: ("foreign function: call_into_lisp_")
23: ("foreign function: funcall1")

The "systemctl status" shows both servers in alive state ATM.

svantevonerichsen6906 commented 1 year ago

The start of the stacktrace seems to be missing, and the error message.

Anth0nyME commented 1 year ago

Oops....

I think i found reason, it's my carelessness.

df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.7G 0 7.7G 0% /dev tmpfs 7.7G 1.1M 7.7G 1% /dev/shm tmpfs 7.7G 362M 7.4G 5% /run tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup /dev/sda1 120G 11G 109G 10% / /dev/sda3 3.9G 881M 2.8G 24% /boot /dev/sdb1 549G 521G 65M 100% /data tmpfs 1.6G 0 1.6G 0% /run/user/1000

I'm sorry.

Anth0nyME commented 1 year ago

The start of the stacktrace seems to be missing, and the error message.

Please explain what it might look like? I will posting more full report in future.

svantevonerichsen6906 commented 1 year ago

Ah, you may notice that the lines you showed start with 8:, 9:, etc. Each of these presents a stack frame. There are 8 frames on top of those shown. The output should also usually start with a description of the error, such as »error STORAGE-CONDITION signalled. [Condition of type STORAGE-CONDITION]«.

Anth0nyME commented 1 year ago

Many thanks! )

Anth0nyME commented 1 year ago

Damn, another error...

Couldn't write to #<SB-SYS:FD-STREAM for "socket, peer: /var/lib/mysql/mysql.sock" {1028B0ADA3}>: Broken pipe

Will try to use TCP instead of unix sockets

Anth0nyME commented 1 year ago

Damn, another error (

From console:

` Heap exhausted during garbage collection: 640 bytes available, 2112 requested. Immobile Object Counts Gen layout fdefn symbol code Boxed Cons Raw Code SmMix Mixed LgRaw LgCode LgMix Waste% Alloc Trig Dirty GCs Mem-age 1 0 0 0 0 530 4 6671 0 0 15 0 0 0 1.3 233410960 143313498 7220 1 1.3589 2 0 0 0 0 860 1 7697 0 0 6 0 0 0 1.3 276994448 10737418 8410 0 0.6591 3 0 0 0 0 764 2 6117 0 1 13 0 0 0 1.1 223513840 10737418 6460 0 0.2350 4 0 0 2 0 149 4 629 0 1 14 0 0 0 1.0 25842032 10737418 630 0 0.0000 5 48 0 0 70 394 27 1730 1 7 41 0 0 4300 2.0 208818064 219555482 1780 29 0.0000 fatal error encountered in SBCL pid 856986 tid 856998: GC invariant lost, file "gencgc.c", line 522

Welcome to LDB, a low-level debugger for the Lisp runtime environment. (GC in progress, oldspace=1, newspace=2) ldb> `

No errors in pgloader.log file.

Last records:

2023-09-16T13:40:11.026301+01:00 NOTICE COPY public.history_uint with 2318469921 rows estimated [3/4] 2023-09-16T13:40:11.035301+01:00 NOTICE COPY public.history with 747700580 rows estimated [1/4] 2023-09-16T20:21:37.475199+01:00 NOTICE COPY public.trends with 59719160 rows estimated [1/4] 2023-09-16T21:05:40.778050+01:00 NOTICE COPY public.auditlog with 19082647 rows estimated [1/4]

I don't understand what i need to do to fix it... Console error and "ldb>" prompt give no direction how to continue migration process.

Can anybody assist to make this problem more... clear?

Thank you!

Anth0nyME commented 1 year ago

Found: https://github.com/dimitri/pgloader/issues/962

Will try to use "with prefetch rows = 10000"