Closed pathikrit closed 7 years ago
Can you please run in debug mode so that you have the full details in the log? can you report what top
and other tools are saying about the running process (is it still running, how much memory is it consuming, etc)?
I tried again with more memory:
LOAD database
FROM mysql://dev:dev@localhost/main
INTO postgresql://dev@localhost/main
WITH include drop, create tables, no truncate, create indexes, reset sequences, foreign keys
SET maintenance_work_mem to '512MB', work_mem to '128MB'
CAST type datetime to timestamptz drop default drop not null using zero-dates-to-null,
type date drop not null drop default using zero-dates-to-null
BEFORE LOAD DO $$ CREATE SCHEMA IF NOT EXISTS main; $$
;
It is stuck at the same table for 8 hours with exactly 1200000 in postgres.
top is showing this for pgloader:
pgloader 1.15 GB 917.5 MB 69 421 6343 pbhowmick 96.0 17:03:40.76
I need more information to understand the case... can you gather more?
Hi, since the debug output includes table names - I am going to delete the previous comment from public view. You should still have my previous comment in an email tied to your GitHub account.
Can't see the email, did you send it explicitly or would it be a github notification? (I don't get those).
I guess i have a similar behaviour.
I have a mysql-database of about 35GB size and in the first step i want to migrate only 12 (backup) Tables of about 4 GB to postgres using pgloader.
Migration stops at certain tables, i identified 3 out of 12 tables to be problematic. All the affected tables include columns with very long text-lines.
What i recognized so far is that when the problem occurs, pgloader does no longer use all my available cores, instead it is stuck to 1 core and utilize it to 100%. What's a bit strange is, that it is 50/50 user and system(?) load. When everything works fine i see only user-load most of the time. Stracing the Process-ID shows no activity.
Trying to run the process in debug ends up in a out-of-memory situation, so i wasn't able to generate a debug.log so far.
As a workaround i can do the migration table by table, which works w/o any issues.
I will see if i'm able to gather further informatione, ...
It might be tied to the new improved support for very long text in the MySQL driver. Can you export a small(wsh) part of the problematic table for me to reproduce the bug locally?
sadly not, this is data for tax-authorities and therefor confidential. I will check if i'm able to anonymize the important parts to give you a sample.
@dimitri : Have you received my e-mail ? i send you a sample dataset to dim@tapoueh.org Was it helpful or do you need more information ?
The email made it to my mailbox but I didn't have time to look at it. Now that I do, it's missing the foo
table definition...
ERROR 1146 (42S02) at line 21 in file: 'foo.sql': Table 'issue337.foo' doesn't exist
Tried to reproduce the problem locally, and all went fine with your data set:
2016-02-15T18:35:48.001000+01:00 NOTICE Starting pgloader, log system is ready.
2016-02-15T18:35:48.615000+01:00 NOTICE DROP then CREATE TABLES
2016-02-15T18:35:48.615000+01:00 NOTICE DROP TABLE IF EXISTS foo CASCADE;
2016-02-15T18:35:48.615000+01:00 WARNING Postgres warning: table "foo" does not exist, skipping
2016-02-15T18:35:48.615000+01:00 NOTICE CREATE TABLE foo
(
id bigint not null,
download_kennwort varchar(30) not null,
sende_info text not null,
program_version text not null,
sendeart bigint not null,
data text not null,
creation_date varchar(14) not null,
creation_date_year varchar(4) not null,
creation_date_month varchar(2) not null,
creation_date_day varchar(2) not null,
creation_date_hour varchar(2) not null,
created_by bigint not null,
ticket_id varchar(50) not null,
origin text not null,
origin_information text not null,
validation bigint not null
);
2016-02-15T18:35:48.621000+01:00 NOTICE CREATE INDEX idx_1414771_cd_dk ON foo (creation_date_year, creation_date_month, creation_date_day, creation_date_hour, download_kennwort);
2016-02-15T18:35:48.621000+01:00 NOTICE CREATE UNIQUE INDEX idx_1414771_primary ON foo (id);
2016-02-15T18:35:48.621000+01:00 NOTICE CREATE INDEX idx_1414771_cd_v ON foo (creation_date, validation);
2016-02-15T18:35:48.621000+01:00 NOTICE CREATE INDEX idx_1414771_validation ON foo (validation);
2016-02-15T18:35:48.621000+01:00 NOTICE CREATE INDEX idx_1414771_ti_dk ON foo (ticket_id, download_kennwort);
2016-02-15T18:35:48.621000+01:00 NOTICE CREATE UNIQUE INDEX idx_1414771_ticket_id_unique ON foo (ticket_id);
2016-02-15T18:35:48.621000+01:00 NOTICE Reset sequences
2016-02-15T18:35:48.826000+01:00 NOTICE ALTER TABLE foo ADD PRIMARY KEY USING INDEX idx_1414771_primary;
2016-02-15T18:35:48.826000+01:00 NOTICE ALTER TABLE foo ADD PRIMARY KEY USING INDEX idx_1414771_primary;
2016-02-15T18:35:48.826000+01:00 NOTICE Comments
table name read imported errors total time read write
----------------------- --------- --------- --------- -------------- --------- ---------
fetch meta data 7 7 0 0.311s
create, drop 0 0 0 0.027s
----------------------- --------- --------- --------- -------------- --------- ---------
foo 1 1 0 0.003s 0.008s 0.003s
----------------------- --------- --------- --------- -------------- --------- ---------
COPY Threads Completion 3 3 0 0.042s
Index Build Completion 6 6 0 0.071s
Create Indexes 6 6 0 0.027s
Reset Sequences 0 0 0 0.037s
Primary Keys 1 1 0 0.007s
Foreign Keys 0 0 0 0.000s
Install comments 0 0 0 0.000s
----------------------- --------- --------- --------- -------------- --------- ---------
Total import time 1 1 0 0.827s 0.008s 0.003s
Yes, the problem only occurs, when i try to migrate the whole database in one step. Doing the migration table-by-table works fine. So it might be related to the amount of data within a table or the amount of tables. When i find the time i will try to build a full (anonymized) demo dataset.
Thanks for your support.
Can you play with concurrency settings (worker count
and concurrency
) in the mean time, or maybe just the batch size in terms of batch rows
?
Already did it, because otherwise pgloader uses way to much memory, starts swapping and needs to be killed. But if have to mention that the virtual instance i have available for the migration is pretty small, 4 cores, 4GB RAM.
This are my settings at the moment:
WITH batch rows = 1000,
batch size = 2MB,
include drop, create tables, create indexes, no foreign keys
Please try with the latest patch. The patch deals better with some kind of errors that you might have here, even if it doesn't sound like it: those errors weren't properly handled at all before.
Sorry for the late reply, i have had the same issue with the latest patch.
+1 I'm seeing this stuck behavior too. Load succeeds when changed to multiple table by table .load scripts using a bash loop and an include only directive.
INCLUDING ONLY TABLE NAMES MATCHING 'xxxyyyzzz'
+1 Exactly same issue, works fine when including a single table in the load script, even when that table was the one where the whole process stuck previously when importing all the tables.
Then there might still be something odd in the way pgloader waits for all table loading to be done, which is found at https://github.com/dimitri/pgloader/blob/master/src/sources/common/db-methods.lisp#L245 and following.
As debug mode seems way too verbose here, can you raise the log level on lines 250 and 263 to :log
instead and see if you can make sense of the output, or share it here?
Another idea would be to see if there's still a CREATE INDEX
command going on while pgloader seems stuck, as it will have to wait until all indexes are ready to reach the next step of completing the catalogs (constraints and comments). You can easily see any such active query in PostgreSQL with the following query:
select pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
I just compiled the current master
branch and the issue persists. I had the idea of going back in time until I found a commit that worked and then git bisect but couldn't complete the build when I checked out v3.1.1 or v3.2.2 due to issue #203 I couldn't spend more time to figure this out yet. I'm gonna check if changing the log level as you suggest gives us useful info in the current branch.
With respect to my config I have:
LOAD database
from mysql://[url]
into postgresql://[url]
with include no drop,
disable triggers,
create no indexes,
no foreign keys,
downcase identifiers,
quote identifiers
CAST type date to date drop not null drop default using zero-dates-to-null,
type timestamp to timestamp drop not null drop default using zero-dates-to-null,
type enum to varchar drop not null drop default using empty-string-to-null
BEFORE LOAD do
$$ drop schema public cascade; $$,
$$ create schema public; $$;
I disabled indexes and other things (does that discard the CREATE INDEX
hypothesis?)
Here is the output to check if there is anything strange in Postgres side when pgloader
is stuck.
db=# select pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
pid | waiting | state | query
-------+---------+-------+----------
19768 | f | idle | SELECT 1
23422 | f | idle | COMMIT
(2 rows)
Some more info, when I stopped the stuck pgloader
process with ctrl-c
I got this dump:
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1007ADE6B3}>
0: ("bogus stack frame")
1: ((FLET #:WITHOUT-INTERRUPTS-BODY-558 :IN SB-THREAD:CONDITION-WAIT))
2: (SB-THREAD:CONDITION-WAIT #<SB-THREAD:WAITQUEUE Anonymous condition variable {100A5DDD23}> #<SB-THREAD:MUTEX "Anonymous lock" (free)> :TIMEOUT NIL)
3: (LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE/NO-LOCK #<unavailable lambda list>)
4: ((FLET #:WITHOUT-INTERRUPTS-BODY-639 :IN SB-THREAD::CALL-WITH-MUTEX))
5: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE) {7FFFE097F11B}> #<SB-THREAD:MUTEX "Anonymous lock" (free)> NIL
T NIL)
6: (LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE #<unavailable lambda list>)
7: (LPARALLEL.KERNEL:RECEIVE-RESULT #S(LPARALLEL.KERNEL:CHANNEL :QUEUE #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-T
HREAD:MUTEX "Anonymous lock" (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {100A5DDD23}>) :KERNEL #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :
USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {100901AF53}>))
8: ((:METHOD PGLOADER.SOURCES:COPY-DATABASE (PGLOADER.SOURCES:DB-COPY)) #<PGLOADER.MYSQL:COPY-MYSQL {1008E80193}> :ON-ERROR-STOP NIL :WORKER-COUNT 4 :CONCURRENCY 1 :TRUNCATE NIL
:DISABLE-TRIGGERS T :DATA-ONLY NIL :SCHEMA-ONLY NIL :CREATE-SCHEMAS NIL :CREATE-TABLES T :INCLUDE-DROP NIL :CREATE-INDEXES NIL :INDEX-NAMES :UNIQUIFY :RESET-SEQUENCES T :FOREIGN-
KEYS NIL :ONLY-TABLES NIL :INCLUDING NIL :EXCLUDING NIL :SET-TABLE-OIDS T :ALTER-TABLE NIL :ALTER-SCHEMA NIL :MATERIALIZE-VIEWS NIL) [fast-method]
9: ((LAMBDA ()))
10: (PGLOADER:RUN-COMMANDS #P"/home/manuca/etl.load" :START-LOGGER NIL :SUMMARY NIL :LOG-FILENAME NIL :LOG-MIN-MESSAGES NIL :CLIENT-MIN-MESSAGES NIL)
11: ((FLET #:CLEANUP-FUN-283 :IN PGLOADER::MAIN)) [cleanup]
12: (PGLOADER::MAIN ("/home/manuca/bin/pgloader_master" "-d" "etl.load"))
13: ((LAMBDA NIL :IN "/home/manuca/src/pgloader/dumper-2SKVI5f7.lisp"))
14: ((FLET #:WITHOUT-INTERRUPTS-BODY-85 :IN SAVE-LISP-AND-DIE))
15: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))
An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73E8A07.
Date/time: 2016-03-29-16:03An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73E8A07.
Looks like it could be an off-by-one receive-result
call, which is used to wait for all tables to be done loading, then for all indexes to be created. The option create no indexes
completely disable the index code path including the receive-result
calls.
On a quick code read I can't see the off-by-one, so next step is going to be reading through the logs as edited (with changes from the :debug
level to the :log
level for some key messages, all found in pgloader:src/sources/common/methods.lisp and pgloader:src/pgsql/pgsql.lisp.
Thanks for your help and time spent in chasing that one bug!
Thanks @dimitri I recompiled pg_loader
raising the log level on those lines you mentioned, here's the output (I renamed the tables), it hangs when it reaches table_9
.
Let me know if you need more data Thanks
~/bin/pgloader_debug etl.load
WARNING: Setting locale failed.
Check the following variables for correct values:
LC_CTYPE=UTF-8
LANG=en_US.UTF-8
2016-03-31T15:40:21.029000-04:00 LOG Main logs in '/tmp/pgloader/pgloader.log'
2016-03-31T15:40:21.033000-04:00 LOG Data errors in '/tmp/pgloader/'
2016-03-31T15:40:21.033000-04:00 LOG Parsing commands from file #P"/home/manuca/etl.load"
2016-03-31T15:40:24.532000-04:00 LOG Finished processing READER for "table_1" 0.043000s
2016-03-31T15:40:24.532000-04:00 LOG Finished processing WORKER for "table_1" 0.015000s
2016-03-31T15:40:24.532000-04:00 LOG Finished processing WRITER for "table_1" 0.000000s
2016-03-31T15:40:24.532000-04:00 LOG writers-counts[table_1] = 0
2016-03-31T15:40:25.005000-04:00 LOG Finished processing READER for "table_2" 0.400000s
2016-03-31T15:40:46.746000-04:00 LOG Finished processing READER for "table_3" 22.332000s
2016-03-31T15:40:46.747000-04:00 LOG Finished processing WORKER for "table_3" 22.407000s
2016-03-31T15:40:46.947000-04:00 LOG Finished processing WORKER for "table_2" 0.123000s
2016-03-31T15:40:47.564000-04:00 LOG Finished processing WRITER for "table_2" 0.664000s
2016-03-31T15:40:47.564000-04:00 LOG writers-counts[table_2] = 0
2016-03-31T15:40:47.813000-04:00 LOG Finished processing WRITER for "table_3" 8.259000s
2016-03-31T15:40:47.813000-04:00 LOG writers-counts[table_3] = 0
2016-03-31T15:40:49.462000-04:00 LOG Finished processing READER for "table_4" 1.981000s
2016-03-31T15:41:27.624000-04:00 LOG Finished processing READER for "table_5" 39.842000s
2016-03-31T15:41:28.946000-04:00 LOG Finished processing WORKER for "table_5" 64.022000s
2016-03-31T15:41:29.152000-04:00 LOG Finished processing WORKER for "table_4" 1.410000s
2016-03-31T15:41:29.355000-04:00 LOG Finished processing WRITER for "table_5" 15.998000s
2016-03-31T15:41:29.355000-04:00 LOG writers-counts[table_5] = 0
2016-03-31T15:41:29.355000-04:00 LOG Finished processing READER for "table_5" 0.050000s
2016-03-31T15:41:29.355000-04:00 LOG Finished processing WORKER for "table_5" 39.841000s
2016-03-31T15:41:30.366000-04:00 LOG Finished processing WRITER for "table_5" 1.001000s
2016-03-31T15:41:30.366000-04:00 LOG writers-counts[table_5] = 0
2016-03-31T15:41:30.366000-04:00 LOG Finished processing WRITER for "table_4" 1.315000s
2016-03-31T15:41:30.366000-04:00 LOG writers-counts[table_4] = 0
2016-03-31T15:42:59.954000-04:00 LOG Finished processing READER for "table_6" 89.560000s
2016-03-31T15:43:03.330000-04:00 LOG Finished processing WORKER for "table_6" 94.002000s
2016-03-31T15:43:08.297000-04:00 LOG Finished processing WRITER for "table_6" 31.424000s
2016-03-31T15:43:08.297000-04:00 LOG writers-counts[table_6] = 0
2016-03-31T15:43:44.226000-04:00 LOG Finished processing READER for "table_7" 133.814000s
2016-03-31T15:43:45.147000-04:00 LOG Finished processing WORKER for "table_7" 45.207000s
2016-03-31T15:43:45.703000-04:00 LOG Finished processing WRITER for "table_7" 18.762000s
2016-03-31T15:43:45.703000-04:00 LOG writers-counts[table_7] = 0
2016-03-31T15:46:22.503000-04:00 LOG Finished processing READER for "table_8" 194.201000s
2016-03-31T15:46:24.682000-04:00 LOG Finished processing WORKER for "table_8" 160.540000s
2016-03-31T15:46:27.575000-04:00 LOG Finished processing WRITER for "table_8" 59.089000s
2016-03-31T15:46:27.575000-04:00 LOG writers-counts[table_8] = 0
2016-03-31T15:46:27.785000-04:00 LOG Finished processing READER for "table_9" 0.226000s
My stack trace ...
2016-04-01T10:17:02.121000+02:00 DEBUG finish-batch[0] 16405 rows in 0.835000s
2016-04-01T10:17:03.318000+02:00 DEBUG finish-batch[0] 24120 rows in 1.002000s
2016-04-01T10:17:04.209000+02:00 DEBUG finish-batch[0] 18274 rows in 1.016000s
2016-04-01T10:17:05.152000+02:00 DEBUG finish-batch[0] 22855 rows in 0.970000s
2016-04-01T10:17:06.092000+02:00 DEBUG finish-batch[0] 22631 rows in 0.937000s
2016-04-01T10:17:07.442000+02:00 DEBUG finish-batch[0] 25928 rows in 1.319000s
2016-04-01T10:17:08.180000+02:00 DEBUG finish-batch[0] 15360 rows in 0.619000s
2016-04-01T10:17:08.911000+02:00 DEBUG finish-batch[0] 14808 rows in 0.565000s
2016-04-01T10:17:09.569000+02:00 DEBUG finish-batch[0] 14525 rows in 0.956000s
2016-04-01T10:17:10.508000+02:00 DEBUG finish-batch[0] 23469 rows in 0.954000s
^CAn unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Date/time: 2016-04-01-10:38An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1007AE6793}>
0: ("bogus stack frame")
1: ((FLET #:WITHOUT-INTERRUPTS-BODY-558 :IN SB-THREAD:CONDITION-WAIT))
2: (SB-THREAD:CONDITION-WAIT #<SB-THREAD:WAITQUEUE Anonymous condition variable {100C8A0653}> #<SB-THREAD:MUTEX "Anonymous lock" (free)> :TIMEOUT NIL)
3: (LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE/NO-LOCK #<unavailable lambda list>)
4: ((FLET #:WITHOUT-INTERRUPTS-BODY-639 :IN SB-THREAD::CALL-WITH-MUTEX))
5: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE) {7FFFE09870DB}> #<SB-THREAD:MUTEX "Anonymous lock" (free)> NIL T NIL)
6: (LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE #<unavailable lambda list>)
7: (LPARALLEL.KERNEL:RECEIVE-RESULT #S(LPARALLEL.KERNEL:CHANNEL :QUEUE #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX "Anonymous lock" (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {100C8A0653}>) :KERNEL #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1008F8A553}>))
8: ((:METHOD PGLOADER.SOURCES:COPY-DATABASE (PGLOADER.SOURCES:DB-COPY)) #<PGLOADER.MYSQL:COPY-MYSQL {1008DF0193}> :ON-ERROR-STOP NIL :WORKER-COUNT 4 :CONCURRENCY 1 :TRUNCATE NIL :DISABLE-TRIGGERS NIL :DATA-ONLY NIL :SCHEMA-ONLY NIL :CREATE-SCHEMAS NIL :CREATE-TABLES T :INCLUDE-DROP T :CREATE-INDEXES T :INDEX-NAMES :UNIQUIFY :RESET-SEQUENCES T :FOREIGN-KEYS T :ONLY-TABLES NIL :INCLUDING NIL :EXCLUDING NIL :SET-TABLE-OIDS T :ALTER-TABLE NIL :ALTER-SCHEMA NIL :MATERIALIZE-VIEWS NIL) [fast-method]
9: ((LAMBDA ()))
10: (PGLOADER:RUN-COMMANDS #P"/home/xxxxxx/Projects/pgloader_scripts/DEBUG.load" :START-LOGGER NIL :SUMMARY NIL :LOG-FILENAME NIL :LOG-MIN-MESSAGES NIL :CLIENT-MIN-MESSAGES NIL)
11: ((FLET #:CLEANUP-FUN-283 :IN PGLOADER::MAIN)) [cleanup]
12: (PGLOADER::MAIN ("pgloader" "--debug" "DEBUG.load"))
13: ((LAMBDA NIL :IN "/home/xxxxxx/src/pgloader/dumper-2SKVI5f7.lisp"))
14: ((FLET #:WITHOUT-INTERRUPTS-BODY-85 :IN SAVE-LISP-AND-DIE))
15: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))
An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Date/time: 2016-04-01-10:38An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1007AE6793}>
0: ((LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX))
1: (SB-IMPL::CALL-WITH-SANE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {101C5490BB}>)
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {101C54908B}>)
3: (PRINT-BACKTRACE :STREAM #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {1000287203}> :START 0 :FROM :DEBUGGER-FRAME :COUNT 4611686018427387903 :PRINT-THREAD T :PRINT-FRAME-SOURCE NIL :METHOD-FRAME-STYLE NIL)
4: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {1000287203}>)
5: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<SB-SYS:INTERACTIVE-INTERRUPT {101C4F8113}> :OUTPUT #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {1000287203}> :IF-EXISTS :APPEND :VERBOSE T)
6: ((FLET #:CLEANUP-FUN-283 :IN PGLOADER::MAIN)) [cleanup]
7: (PGLOADER::MAIN ("pgloader" "--debug" "DEBUG.load"))
8: ((LAMBDA NIL :IN "/home/xxxxxx/src/pgloader/dumper-2SKVI5f7.lisp"))
9: ((FLET #:WITHOUT-INTERRUPTS-BODY-85 :IN SAVE-LISP-AND-DIE))
10: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))
This is another run, pgloader recompiled with LOG output.
$ pgloader DEBUG.load
2016-04-01T10:47:15.025000+02:00 LOG Main logs in '/tmp/pgloader/pgloader.log'
2016-04-01T10:47:15.030000+02:00 LOG Data errors in '/tmp/pgloader/'
2016-04-01T10:47:15.030000+02:00 LOG Parsing commands from file #P"/home/xxxxxx/Projects/pgloader_scripts/DEBUG.load"
2016-04-01T10:50:27.545000+02:00 LOG Finished processing READER for "table_1" 190.655000s
2016-04-01T10:50:27.545000+02:00 LOG Finished processing WORKER for "table_1" 190.867000s
2016-04-01T10:50:28.259000+02:00 LOG Finished processing WRITER for "table_1" 93.702000s
2016-04-01T10:50:28.260000+02:00 LOG writers-counts[table_1] = 0
2016-04-01T10:50:32.749000+02:00 LOG Finished processing READER for "table_3" 4.243000s
2016-04-01T10:54:58.057000+02:00 LOG Finished processing READER for "table_2" 270.708000s
2016-04-01T10:54:58.502000+02:00 LOG Finished processing WORKER for "table_2" 461.645000s
2016-04-01T10:54:58.703000+02:00 LOG Finished processing WRITER for "table_2" 123.545000s
2016-04-01T10:54:58.703000+02:00 LOG writers-counts[table_2] = 0
2016-04-01T10:54:59.686000+02:00 LOG Finished processing WORKER for "table_3" 1.515000s
2016-04-01T10:54:59.887000+02:00 LOG Finished processing WRITER for "table_3" 0.635000s
2016-04-01T10:54:59.887000+02:00 LOG writers-counts[table_3] = 0
2016-04-01T10:57:28.785000+02:00 LOG Finished processing READER for "table_4" 150.171000s
2016-04-01T10:57:29.247000+02:00 LOG Finished processing WORKER for "table_4" 416.597000s
2016-04-01T10:57:29.448000+02:00 LOG Finished processing WRITER for "table_4" 76.191000s
2016-04-01T10:57:29.448000+02:00 LOG writers-counts[table_4] = 0
And the monitoring...
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state | query
-------------------------------+-------+---------+-------+--------
2016-04-01 08:49:02.085738+00 | 28395 | f | idle |
2016-04-01 08:49:02.085738+00 | 17080 | f | idle | COMMIT
(2 rows)
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state | query
-------------------------------+-------+---------+--------+------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
2016-04-01 08:50:32.219118+00 | 28395 | f | idle |
2016-04-01 08:50:32.219118+00 | 17510 | f | active | CREATE UNIQUE INDEX idx_36116_primary ON table_1 (table_1_id);
2016-04-01 08:50:32.219118+00 | 17509 | f | active |
+
| | | | copy table_2 (table_2_id,...<SNIP>...) FROM STDIN
(3 rows)
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state | query
-------------------------------+-------+---------+--------+------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
2016-04-01 08:54:12.311957+00 | 28395 | f | idle |
2016-04-01 08:54:12.311957+00 | 17597 | f | active | autovacuum: ANALYZE qqqqqq.table_2
2016-04-01 08:54:12.311957+00 | 17509 | f | active |
+
| | | | copy table_2 (table_2_id,...<SNIP>...) FROM STDIN
(3 rows)
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state |
query
-------------------------------+-------+---------+---------------------+-----------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
2016-04-01 08:55:20.010652+00 | 28395 | f | idle |
2016-04-01 08:55:20.010652+00 | 17611 | f | idle in transaction |
+
| | | | copy table_4 (table_4_id,...<SNIP>...) FROM STDIN
(2 rows)
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state | query
-------------------------------+-------+---------+-------+--------
2016-04-01 08:57:02.173661+00 | 28395 | f | idle |
2016-04-01 08:57:02.173661+00 | 17611 | f | idle | COMMIT
(2 rows)
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state | query
------------------------------+-------+---------+-------+--------
2016-04-01 09:00:39.99215+00 | 28395 | f | idle |
2016-04-01 09:00:39.99215+00 | 17648 | f | idle | COMMIT
(2 rows)
database01=# select CURRENT_TIMESTAMP , pid, waiting, state, query from pg_stat_activity where pid <> pg_backend_pid();
now | pid | waiting | state | query
-------------------------------+-------+---------+-------+--------
2016-04-01 09:13:07.749294+00 | 28395 | f | idle |
2016-04-01 09:13:07.749294+00 | 17648 | f | idle | COMMIT
(2 rows)
@dimitri any clue about what could be creating this issue and a possible solution? Thanks
It still looks like a one-to-many call to receive-result
, which is blocking forever when there's no result to receive anymore. There are several ways to fix that issue, the best one would be to understand why we expect a result that never shows up... maybe another thread has some non-reported error or is waiting for us in the interactive debugger or something, although that's quite unlikely.
Is there a way that you can send me a database dump from which I could reproduce the problem locally?
Unfortunately I can't send you a dump of this DB, I wish I could reproduce the bug on a DB without sensitive information.
Can you try to reproduce with the schema only, no data? if the bug still happens maybe you can share? Other than that if signing some acceptable papers allows us to work more closely here (sharing a cloud test instance or something) let's do that?
I just tried a migration with the docker image dimitri/pgloader:ccl.latest and a pg.load file pointing to a source schema with 6 tables. It hung about 1/2 way through but did process the largest table successfully.
Can you share the source schema in order for me to reproduce the bug locally and get it fixed, please? Either here if possible, or by private email (see my profile). Thanks!
I got enough data to be able to reproduce a case that looks like this one, please retry!
Looks good, i did a test with a table that previously frequently run into that problem. The migration has worked three times in a row.
When i have the time i will try a test migration of the whole database again.
Sorry, still hanging for me. Heres the stack trace after ^Cing it.
^CAn unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Date/time: 2016-04-21-15:47An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1007B06793}>
0: ("bogus stack frame")
1: ((FLET #:WITHOUT-INTERRUPTS-BODY-558 :IN SB-THREAD:CONDITION-WAIT))
2: (SB-THREAD:CONDITION-WAIT #<SB-THREAD:WAITQUEUE Anonymous condition variable {100C107723}> #<SB-THREAD:MUTEX "Anonymous lock" (free)> :TIMEOUT NIL)
3: (LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE/NO-LOCK #<unavailable lambda list>)
4: ((FLET #:WITHOUT-INTERRUPTS-BODY-639 :IN SB-THREAD::CALL-WITH-MUTEX))
5: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE) {7FFFE09870CB}> #<SB-THREAD:MUTEX "Anonymous lock" (free)> NIL T NIL)
6: (LPARALLEL.CONS-QUEUE:POP-CONS-QUEUE #<unavailable lambda list>)
7: (LPARALLEL.KERNEL:RECEIVE-RESULT #S(LPARALLEL.KERNEL:CHANNEL :QUEUE #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX "Anonymous lock" (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {100C107723}>) :KERNEL #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1008FE2953}>))
8: ((:METHOD PGLOADER.SOURCES:COPY-DATABASE (PGLOADER.SOURCES:DB-COPY)) #<PGLOADER.MYSQL:COPY-MYSQL {1008E465C3}> :ON-ERROR-STOP NIL :WORKER-COUNT 4 :CONCURRENCY 1 :MAX-PARALLEL-CREATE-INDEX NIL :TRUNCATE NIL :DISABLE-TRIGGERS NIL :DATA-ONLY NIL :SCHEMA-ONLY NIL :CREATE-SCHEMAS NIL :CREATE-TABLES T :INCLUDE-DROP T :CREATE-INDEXES T :INDEX-NAMES :UNIQUIFY :RESET-SEQUENCES T :FOREIGN-KEYS T :ONLY-TABLES NIL :INCLUDING NIL :EXCLUDING NIL :SET-TABLE-OIDS T :ALTER-TABLE NIL :ALTER-SCHEMA NIL :MATERIALIZE-VIEWS NIL) [fast-method]
9: ((LAMBDA ()))
10: (PGLOADER:RUN-COMMANDS #P"/home/xxxxxx/Projects/pgloader_scripts/DEBUG.load" :START-LOGGER NIL :SUMMARY NIL :LOG-FILENAME NIL :LOG-MIN-MESSAGES NIL :CLIENT-MIN-MESSAGES NIL)
11: ((FLET #:CLEANUP-FUN-283 :IN PGLOADER::MAIN)) [cleanup]
12: (PGLOADER::MAIN ("pgloader" "--debug" "DEBUG.load"))
13: ((LAMBDA NIL :IN "/home/xxxxxx/bak.src/pgloader/dumper-2SKVI5f7.lisp"))
14: ((FLET #:WITHOUT-INTERRUPTS-BODY-85 :IN SAVE-LISP-AND-DIE))
15: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))
An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Date/time: 2016-04-21-15:47An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF73EB9B7.
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1007B06793}>
0: ((LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX))
1: (SB-IMPL::CALL-WITH-SANE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {10241EA17B}>)
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {10241EA14B}>)
3: (PRINT-BACKTRACE :STREAM #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {100028F203}> :START 0 :FROM :DEBUGGER-FRAME :COUNT 4611686018427387903 :PRINT-THREAD T :PRINT-FRAME-SOURCE NIL :METHOD-FRAME-STYLE NIL)
4: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {100028F203}>)
5: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<SB-SYS:INTERACTIVE-INTERRUPT {1024198113}> :OUTPUT #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {100028F203}> :IF-EXISTS :APPEND :VERBOSE T)
6: ((FLET #:CLEANUP-FUN-283 :IN PGLOADER::MAIN)) [cleanup]
7: (PGLOADER::MAIN ("pgloader" "--debug" "DEBUG.load"))
8: ((LAMBDA NIL :IN "/home/xxxxxx/bak.src/pgloader/dumper-2SKVI5f7.lisp"))
9: ((FLET #:WITHOUT-INTERRUPTS-BODY-85 :IN SAVE-LISP-AND-DIE))
10: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))
[xxxxxx@ZZZZZZ pgloader_scripts]$ pgloader --version
pgloader version "3.3.7af6c7a"
compiled with SBCL 1.2.8-1.el7
[xxxxxx@ZZZZZZ pgloader_scripts]$
Can you please try to run it in a developer environment, rather than the command line, so that you can easily have access to the interactive debugger and maybe have a better stack trace (this one looks like the generic off-by-one error that seems to be happening way after the real problems we need to solve).
To do that, install a recent enough SBCL (anything 1.2 or 1.3 era, see http://sbcl.org), then install Quicklisp on it (see https://www.quicklisp.org/beta/#installation), then at the prompt do:
$ sbcl
* (ql:quickload "pgloader")
* (pgloader::run-commands "path/to/file.load" :client-min-messages :notice)
And let's see what happens then. You should reach the interactive debugger and type in back
to get a backtrace, then ^D
your way out to the shell.
Ok, tried it again with the whole database and still have this issue. It's a bit strange, that the single table has worked that fine.
Do you have any new logs or traces or things that would help figure out the situation? Can you do the “lisp developer” test as above?
I'm getting this issue now. As others have reported, pgloader ports my tables fine individually, but not all together. More generally, it seems that porting a smaller subset of my tables works, but porting all or a larger subset of my tables results in an indefinite hang.
The hang happens when pgloader is built with CCL according to the Docker file. Previously I was trying with a SBCL build of pgloader, and was running into issue #327 instead ("Heap exhausted, game over").
I think I just did a typical installation of SBCL, but it brought me to a debugger upon the heap-exhaustion message. From there I typed back
. Here's the output if it helps any:
Heap exhausted during garbage collection: 480 bytes available, 832 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 42949672 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 42949672 0 0 0.0000
2: 113964 124237 0 0 3221 34488 5 0 11 1226768288 9044064 42949672 0 0 1.0415
3: 81167 122649 0 0 4492 42335 304 0 47 1538635040 5753568 42949672 4676 0 0.4966
4: 14436 45001 0 0 3636 19584 336 0 99 767806992 4076016 2000000 3802 0 0.1759
5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
6: 0 0 0 0 2427 1135 0 0 0 116719616 0 2000000 2258 0 0.0000
Total bytes allocated = 4271938096
Dynamic-space-size bytes = 4294967296
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 16470(tid 140737026979584):
Heap exhausted, game over.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> back
Backtrace:
0: SB-KERNEL::%MAKE-ARRAY, (LAMBDA (SB-IMPL::A SB-IMPL::B) :IN SB-KERNEL::%MAKE-ARRAY), (LAMBDA (SB-IMPL::VALUE) :IN SB-IMPL::%SAVE-DISPLACED-ARRAY-BACKPOINTER), pc = 0x10001cda9a, fp = 0x7fffe47fe038
1: PGLOADER.PGSQL::FORMAT-VECTOR-ROW, pc = 0x10029b193f, fp = 0x7fffe47fe0f0
2: (SB-PCL::FAST-METHOD PGLOADER.SOURCES::FORMAT-DATA-TO-COPY (PGLOADER.SOURCES::COPY T T)), (FLET H0 :IN PGLOADER.SOURCES::FORMAT-DATA-TO-COPY), pc = 0x1003c8fd49, fp = 0x7fffe47fe248
3: LPARALLEL.KERNEL::MAKE-CHANNELED-TASK, (LAMBDA () :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK), (FLET BODY-FN635 :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK), pc = 0x1002c343fe, fp = 0x7fffe47fe308
4: LPARALLEL.KERNEL::MAKE-CHANNELED-TASK, (LAMBDA () :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK), (FLET BODY-FN635 :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK), pc = 0x1002c3428b, fp = 0x7fffe47fe3c8
5: LPARALLEL.KERNEL::EXEC-TASK/WORKER, pc = 0x1002c36430, fp = 0x7fffe47fe440
6: LPARALLEL.KERNEL::WORKER-LOOP, pc = 0x1002c3ddff, fp = 0x7fffe47fe4c0
7: LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER, pc = 0x1002c388ff, fp = 0x7fffe47fe510
8: LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT, (LAMBDA () :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT), pc = 0x1002c4321c, fp = 0x7fffe47fe5d8
9: LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT, (LAMBDA () :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT), pc = 0x1002c42e27, fp = 0x7fffe47fe678
10: LPARALLEL.KERNEL::MAKE-WORKER-THREAD, (LAMBDA () :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD), pc = 0x1002c40299, fp = 0x7fffe47fe700
11: BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS, (LAMBDA () :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS), pc = 0x100352d172, fp = 0x7fffe47fe798
12: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE, (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (LAMBDA (COND) :IN "SYS:SRC;CODE;TARGET-THREAD.LISP"), (LAMBDA (STREAM) :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (LAMBDA () :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (FLET SB-THREAD::WITH-SYSTEM-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (FLET SB-THREAD::WITH-SYSTEM-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), pc = 0x100019184f, fp = 0x7fffe47fe8f0
13: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE, (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (LAMBDA (COND) :IN "SYS:SRC;CODE;TARGET-THREAD.LISP"), (LAMBDA (STREAM) :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (LAMBDA () :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (FLET SB-THREAD::WITH-SYSTEM-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (FLET SB-THREAD::WITH-SYSTEM-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), pc = 0x1000190d7c, fp = 0x7fffe47feab0
14: SB-THREAD::CALL-WITH-MUTEX, pc = 0x100018194f, fp = 0x7fffe47feb78
15: SB-THREAD::CALL-WITH-MUTEX, pc = 0x1000181574, fp = 0x7fffe47fec40
16: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE, (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (LAMBDA (COND) :IN "SYS:SRC;CODE;TARGET-THREAD.LISP"), (LAMBDA (STREAM) :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (LAMBDA () :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (FLET SB-THREAD::WITH-SYSTEM-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), (FLET SB-THREAD::WITH-SYSTEM-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE), pc = 0x1000190546, fp = 0x7fffe47fedf0
17: Foreign function call_into_lisp, pc = 0x4280ef, fp = 0x7fffe47fee30
18: Foreign function new_thread_trampoline, pc = 0x41e04a, fp = 0x7fffe47feee0
Also for what it's worth: When the hang happened on CCL, I checked the process through a second SSH session, and ps aux
showed a virtual memory size (VSZ) of 537858452. I'm not 100% sure how to read that number, but it seemed a little bit high.
SBCL got its heap exhausted with a large single table from my database, which CCL was able to handle. My impression so far is that CCL merely increased the memory capacity or made it not leak quite as fast compared to SBCL, but they both ran into a similar memory usage problem.
I'm wondering, is there any evidence so far that "port more than x GB of data" isn't the only criterion for this to happen (with x varying between different CL versions and different systems)?
Oh hey, it seems that one of my tables JUST got large enough to trigger the CCL hang by itself. Bad timing for my work, but perhaps good timing for debugging this.
Here's the MySQL table schema as output by show columns
, omitting the column names for privacy:
+------------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+---------+----------------+
| <name> | int(11) | NO | PRI | NULL | auto_increment |
| <name> | datetime | NO | | NULL | |
| <name> | int(11) | NO | MUL | NULL | |
| <name> | int(11) | NO | MUL | NULL | |
| <name> | int(11) | NO | MUL | NULL | |
| <name> | int(11) | NO | MUL | NULL | |
| <name> | int(11) | NO | MUL | NULL | |
| <name> | int(11) | YES | MUL | NULL | |
+------------------+----------+------+-----+---------+----------------+
And when I told pgloader to port only this table, it got stuck after 8496570 rows.
This is with pgloader git-cloned 2-3 days ago.
I still don't believe there's a direct correlation in between the number of rows and some memory pressure limit here, because I have had reports of success on much larger data sets and because we batch the rows. The memory exhaustion seems to be a factor of how large an individual batch of rows turns out to be and how many of those batches are concurrently filled and available in memory.
Also, you're saying you can reproduce the hang rather than the exhaustion, and that's a different thing. Is it possible for you to share a database dump of only that table so that I can reproduce the problem locally (with some luck)? it's only integers and a date time, I guess there's no filtering for anonymity of the data involved... You'll find my email on my github profile if you can share the data!
All right, I just emailed you a MySQL dump that gets the CCL hang. I was able to rename my columns and overwrite my data with integer 1s and a fixed date, and the CCL hang still happened. In addition to giving me data privacy, this should allow you to add and remove rows freely for testing, without worrying about adhering to some obscure data pattern.
After more experimentation, it seems that the PostgreSQL maintenance_work_mem
and work_mem
values are quite relevant here:
SET maintenance_work_mem to '64MB', work_mem to '4MB'
finishes after about 5-6 minutes. (These are the PostgreSQL 9.5 defaults.)
SET maintenance_work_mem to '128MB', work_mem to '12MB'
hangs. (These values are in the MySQL example in the pgloader docs.)
SET maintenance_work_mem to '256MB', work_mem to '32MB'
hangs.
Here's an example of it finishing with the PostgreSQL 9.5 defaults:
table name read imported errors total time read write
----------------------- --------- --------- --------- -------------- --------- ---------
fetch meta data 8 8 0 0.043s
create, drop 0 0 0 1.089s
----------------------- --------- --------- --------- -------------- --------- ---------
test_table 8494733 8494733 0 2m18.710s 3m25.415s 2m18.707s
----------------------- --------- --------- --------- -------------- --------- ---------
COPY Threads Completion 3 3 0 3m28.630s
Index Build Completion 7 7 0 2m31.947s
Create Indexes 7 7 0 17m11.154s
Reset Sequences 0 1 0 1.918s
Primary Keys 1 1 0 0.149s
Foreign Keys 0 0 0 0.000s
Install comments 0 0 0 0.000s
----------------------- --------- --------- --------- -------------- --------- ---------
Total import time 8494733 8494733 0 6m4.934s 3m25.415s 2m18.707s
The table has 8494733 rows.
If I go into MySQL and run delete from test_table where id > 10700000;
to cut it down to 8008090 rows, then SET maintenance_work_mem to '128MB', work_mem to '12MB'
finishes after about 10 minutes. Here's an example:
table name read imported errors total time read write
----------------------- --------- --------- --------- -------------- --------- ---------
fetch meta data 8 8 0 0.018s
create, drop 0 0 0 1.352s
----------------------- --------- --------- --------- -------------- --------- ---------
test_table 8008090 8008090 0 2m9.801s 3m12.179s 2m9.799s
----------------------- --------- --------- --------- -------------- --------- ---------
COPY Threads Completion 3 3 0 3m14.934s
Index Build Completion 7 7 0 6m53.889s
Create Indexes 7 7 0 44m1.343s
Reset Sequences 0 1 0 1.458s
Primary Keys 1 1 0 0.081s
Foreign Keys 0 0 0 0.000s
Install comments 0 0 0 0.000s
----------------------- --------- --------- --------- -------------- --------- ---------
Total import time 8008090 8008090 0 10m12.954s 3m12.179s 2m9.799s
When it hangs, if I do ps aux | grep pgloader
in a second terminal, I can see that the running time of the pgloader process doesn't change anymore; it's typically stuck at around 4 minutes. I typically waited anywhere from 10-20 minutes before deciding that it must have hanged. I wouldn't rule out the idea that the hang scenarios described above could have eventually finished if I waited longer. However, even if one ignored the hang cases, if you just compare these timing stats with the first set of timing stats, something seems a little odd.
I'm running pgloader with a local MySQL 5.5 database and a remote PostgreSQL 9.5 database (an Amazon RDS instance).
Besides setting maintenance_work_mem
and work_mem
, no other options are required in the load command file to reproduce the hang.
Also, in my previous post I mentioned the SBCL heap exhaustion, since earlier you suggested running with SBCL to get a traceback. The heap exhaustion is what happened when I tried running with SBCL (in the case where CCL would hang). I guess I should leave that aside for now though.
Thanks a lot for sharing the data! Unfortunately I couldn't reproduce the error yet. Here's what I tried:
load database
from mysql://root@localhost/annotationtest
into pgsql:///annotationtest
set maintenance_work_mem to '128MB', work_mem to '12MB';
And here's what I got:
CL-USER> (pgloader:run-commands "/Users/dim/dev/temp/pgloader-issues/337/annotationtest.load" :client-min-messages :notice)
2016-05-22T23:36:55.000446+02:00 NOTICE Starting pgloader, log system is ready.
2016-05-22T23:36:55.000936+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/337/annotationtest.load"
2016-05-22T23:36:56.009969+02:00 NOTICE DROP then CREATE TABLES
2016-05-22T23:36:56.010220+02:00 NOTICE DROP TABLE IF EXISTS test_table CASCADE;
2016-05-22T23:36:56.010704+02:00 WARNING Postgres warning: table "test_table" does not exist, skipping
2016-05-22T23:36:56.010973+02:00 NOTICE CREATE TABLE test_table
(
id bigserial not null,
date timestamptz not null,
key1 bigint not null,
key2 bigint not null,
key3 bigint not null,
key4 bigint not null,
key5 bigint not null,
key6 bigint
);
2016-05-22T23:40:19.627068+02:00 NOTICE CREATE INDEX idx_1505481_annotations_annotation_63868627 ON test_table (key4);
2016-05-22T23:40:19.627548+02:00 NOTICE CREATE INDEX idx_1505481_annotations_annotation_6682136 ON test_table (key2);
2016-05-22T23:40:19.628220+02:00 NOTICE CREATE INDEX idx_1505481_annotations_annotation_fbfc09f1 ON test_table (key3);
2016-05-22T23:40:19.629955+02:00 NOTICE CREATE UNIQUE INDEX idx_1505481_primary ON test_table (id);
2016-05-22T23:40:19.630198+02:00 NOTICE CREATE INDEX idx_1505481_annotations_annotation_9e5a71d1 ON test_table (key1);
2016-05-22T23:40:19.630415+02:00 NOTICE CREATE INDEX idx_1505481_annotations_annotation_89f89e85 ON test_table (key5);
2016-05-22T23:40:19.630633+02:00 NOTICE CREATE INDEX idx_1505481_annotations_annotation_58f6a228 ON test_table (key6);
2016-05-22T23:41:13.768254+02:00 NOTICE Reset sequences
2016-05-22T23:41:14.172405+02:00 NOTICE ALTER TABLE test_table ADD PRIMARY KEY USING INDEX idx_1505481_primary;
2016-05-22T23:41:14.172674+02:00 NOTICE Comments
table name read imported errors total time read write
----------------------- --------- --------- --------- -------------- --------- ---------
fetch meta data 8 8 0 0.768s
create, drop 0 0 0 0.077s
----------------------- --------- --------- --------- -------------- --------- ---------
test_table 8494733 8494733 0 1m16.956s 3m22.867s 1m16.955s
----------------------- --------- --------- --------- -------------- --------- ---------
COPY Threads Completion 3 3 0 3m23.473s
Index Build Completion 7 7 0 54.193s
Create Indexes 7 7 0 6m10.545s
Reset Sequences 0 1 0 0.368s
Primary Keys 1 1 0 0.040s
Foreign Keys 0 0 0 0.000s
Install comments 0 0 0 0.000s
----------------------- --------- --------- --------- -------------- --------- ---------
Total import time 8494733 8494733 0 4m19.173s 3m22.867s 1m16.955s
That's from the Common Lisp developper environment, a SLIME prompt (see https://common-lisp.net/project/slime/ for details). No luck this time, will try with other parameters in the next days...
OK. No pressure on time for now, since I could successfully port my table with 64MB/4MB work_mem. I'll be around to provide additional testing/info if needed though.
2016-06-16T11:22:50.831000+02:00 DEBUG finish-batch[0] 40389 rows in 2.327000s
Heap exhausted during garbage collection: 48 bytes available, 64 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 42949672 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 42949672 0 0 0.0000
2: 0 0 0 0 0 0 0 0 0 0 0 42949672 0 0 0.0000
3: 123251 130720 0 0 15547 66168 140 0 13 2676272480 5952160 42949672 0 0 0.9950
4: 0 0 0 0 0 0 0 0 0 0 0 42949672 0 0 0.0000
5: 27235 35933 0 0 2747 11728 82 0 122 475119440 1884336 2000000 2690 0 0.4103
6: 0 0 0 0 2759 1177 0 0 0 128974848 0 2000000 2581 0 0.0000
Total bytes allocated = 4283970000
Dynamic-space-size bytes = 4294967296
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 32581(tid 140736913143552):
Heap exhausted, game over.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> backtrace
Backtrace:
0: Foreign function (null), fp = 0x7fffddb6d130, ra = 0x41098a
1: Foreign function (null), fp = 0x7fffddb6d220, ra = 0x410b7f
2: Foreign function gc_heap_exhausted_error_or_lose, fp = 0x7fffddb6d250, ra = 0x41d0ba
3: Foreign function gc_find_freeish_pages, fp = 0x7fffddb6d2a0, ra = 0x41d42f
4: Foreign function (null), fp = 0x7fffddb6d2f0, ra = 0x41d96c
5: Foreign function gc_alloc_with_region, fp = 0x7fffddb6d330, ra = 0x41dd31
6: Foreign function (null), fp = 0x7fffddb6d370, ra = 0x41e1f0
7: Foreign function (null), fp = 0x7fffddb6d3a0, ra = 0x40db45
8: Foreign function scavenge, fp = 0x7fffddb6d3e0, ra = 0x40d8a4
9: Foreign function (null), fp = 0x7fffddb6d400, ra = 0x41c0f2
10: Foreign function collect_garbage, fp = 0x7fffddb6d490, ra = 0x420a29
11: SB-KERNEL::COLLECT-GARBAGE
12: (COMMON-LISP::FLET WITHOUT-GCING-BODY-60 KEYWORD::IN SB-KERNEL::SUB-GC)
13: (COMMON-LISP::FLET SB-THREAD::EXEC KEYWORD::IN SB-KERNEL::SUB-GC)
14: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-53 KEYWORD::IN SB-KERNEL::SUB-GC)
15: SB-KERNEL::SUB-GC
16: Foreign function call_into_lisp, fp = 0x7fffddb6d7d0, ra = 0x42166f
17: Foreign function maybe_gc, fp = 0x7fffddb6d800, ra = 0x41061b
18: Foreign function interrupt_handle_pending, fp = 0x7fffddb6d830, ra = 0x413079
19: Foreign function handle_trap, fp = 0x7fffddb6d870, ra = 0x413bb5
20: Foreign function (null), fp = 0x7fffddb6d8b0, ra = 0x4113cf
21: Foreign function (null), fp = 0x7fffddb6dd48, ra = 0x7ffff79ca100
22: QMYND-IMPL::PARSE-RESULTSET-ROW
23: QMYND-IMPL::MAP-RESULTSET-ROWS
24: QMYND-IMPL::SEND-COMMAND-QUERY
25: QMYND::MYSQL-QUERY
26: (SB-PCL::FAST-METHOD PGLOADER.SOURCES::MAP-ROWS (PGLOADER.MYSQL::COPY-MYSQL))
27: (SB-PCL::FAST-METHOD PGLOADER.SOURCES::QUEUE-RAW-DATA (PGLOADER.SOURCES::COPY COMMON-LISP::T))
28: (COMMON-LISP::FLET BODY-FN621 KEYWORD::IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK)
29: (COMMON-LISP::LAMBDA () KEYWORD::IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK)
30: (SB-C::XEP (COMMON-LISP::LAMBDA () KEYWORD::IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
31: LPARALLEL.KERNEL::EXEC-TASK/WORKER
32: LPARALLEL.KERNEL::WORKER-LOOP
33: LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER
34: (COMMON-LISP::LAMBDA () KEYWORD::IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT)
35: LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT
36: (COMMON-LISP::LAMBDA () KEYWORD::IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD)
37: (COMMON-LISP::LAMBDA () KEYWORD::IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS)
38: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-1171 KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
39: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
40: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-639 KEYWORD::IN SB-THREAD::CALL-WITH-MUTEX)
41: SB-THREAD::CALL-WITH-MUTEX
42: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE
43: Foreign function call_into_lisp, fp = 0x7fffddb6eef0, ra = 0x42166f
44: Foreign function new_thread_trampoline, fp = 0x7fffddb6ef10, ra = 0x418ae7
ldb>
Have a similar problem with
pgloader version "3.3.1c927be"
compiled with SBCL 1.3.11
It just hangs at one point
I have gone back to
pgloader version "3.3.2"
compiled with SBCL 1.3.11
And it passed nicely
This is the line that never appeared in the 3.3.1c927be version, hangs just before it:
2016-12-13T14:12:26.945000+01:00 NOTICE Reset sequences
EDIT: spoke too soon. The second time passed because the target database did not have any indexes from the last import which i interrupted.
Might be connected with https://github.com/dimitri/pgloader/issues/420
FTR, my migration got stuck too on Postgres 9.6 with
pgloader version "3.3.1"
compiled with SBCL 1.2.4.debian
I checked in postgres that it was stuck in the COMMIT
phase:
=> SELECT pid, age(query_start, clock_timestamp()), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | query
------+------------------+---------+--------------------------
4257 | | monitor | <insufficient privilege>
4256 | | monitor | <insufficient privilege>
4258 | | monitor | <insufficient privilege>
5292 | -17:03:27.903784 | project | COMMIT
(4 rows)
Hitting ^C
didn't give my much information, only
$ time pgloader pgloader-project.load
2016-12-27T15:57:33.042000Z LOG Main logs in '/tmp/pgloader/pgloader.log'
2016-12-27T15:57:33.044000Z LOG Data errors in '/tmp/pgloader/'
2016-12-27T15:57:33.045000Z LOG Parsing commands from file #P"/appl/project/pgloader-project.load"
2016-12-27T16:01:40.807000Z WARNING PostgreSQL warning: identifier "idx_107511181_idx_unique_client_member_channels_channel_id_client_member_id" will be truncated to "idx_107511181_idx_unique_client_member_channels_channel_id_clie"
2016-12-27T16:59:25.794000Z ERROR Database error 22021: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY comments, line 381
2016-12-27T16:59:25.821000Z ERROR Database error 22021: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY comments, line 120
2016-12-27T16:59:25.822000Z ERROR Database error 22021: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY comments, line 183
2016-12-27T16:59:26.022000Z ERROR Database error 22021: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY comments, line 1
2016-12-27T17:11:11.085000Z WARNING PostgreSQL warning: identifier "idx_107511224_idx_comment_post_id_parent_id_number_is_or_has_unseen" will be truncated to "idx_107511224_idx_comment_post_id_parent_id_number_is_or_has_un"
^CAn unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF71C8707.
An unhandled error condition has been signalled:
Interactive interrupt at #x7FFFF71C8707.
real 1125m47.541s
user 750m39.538s
sys 396m38.940s
The file /tmp/pgloader/pgloader.log
didn't reveal anything related to this, only some warnings because of too long identifiers.
I'm in the process of trying a newer version/compile the current one. I may also run different strategies with the current version and will report back.
I tried the offending table (a very big table, in MySQL 75GB and ~60 million rows) standalone and it did not hang.
I looked into the respective postgres process this time using strace -p ...
(using the PID form the SQL query above) and only got:
=> SELECT pid, age(query_start, clock_timestamp()), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | query
-------+------------------+---------+--------------------------
13165 | | monitor | <insufficient privilege>
13164 | | monitor | <insufficient privilege>
13166 | | monitor | <insufficient privilege>
22792 | -00:29:16.337757 | project | COMMIT
(4 rows)
strace -p 22792
Process 22792 attached
epoll_wait(3,
fd 3 refers just to
lrwx------ 1 postgres postgres 64 Dec 28 17:06 /proc/22792/fd/3 -> anon_inode:[eventpoll]
One of the pgloader processes/threads however was going full-circle polling here ( identified two entries via htop
consuming much CPU):
strace -p 20021
Process 20021 attached
futex(0x1012d8bbb8, FUTEX_WAIT_PRIVATE, 157279619, NULL
Nothing happening here. And the other one:
strace -p 20025 2>&1 | head
Process 20025 attached
read(8, "", 4096) = 0
poll([{fd=8, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=8, revents=POLLIN}])
read(8, "", 4096) = 0
poll([{fd=8, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=8, revents=POLLIN}])
read(8, "", 4096) = 0
poll([{fd=8, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=8, revents=POLLIN}])
read(8, "", 4096) = 0
poll([{fd=8, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=8, revents=POLLIN}])
read(8, "", 4096) = 0
Basically an endless loop from what I saw:
lrwx------ 1 mfischer mfischer 64 Dec 28 17:08 /proc/20025/fd/8 -> socket:[179374]
And
lsof|grep 179374
pgloader 20021 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20022 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20024 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20025 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20026 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20027 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20028 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20029 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20030 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20031 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20032 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20033 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20034 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20035 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20036 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20037 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20038 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20039 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20040 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20041 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20042 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20043 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20044 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20045 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
pgloader 20021 20046 user 8u IPv4 179374 0t0 TCP project-pgsql02:51304->a.b.c.d:mysql (CLOSE_WAIT)
Btw, compared to my other comment https://github.com/dimitri/pgloader/issues/337#issuecomment-269461637 , this one was tested with 3.3.2
Here's the config I used this time:
load database
from mysql://...
into postgresql://...
WITH
batch rows = 1000
CAST
type varchar to varchar,
type bigint with extra auto_increment to bigserial drop typemod,
type bigint to bigint drop typemod
;
I had to use batch rows = 1000
because otherwise I would get https://github.com/dimitri/pgloader/issues/487
As others said, creating indices are not involved in this "hang"-problem. In my tests I still disable them because it brings me faster into the "hung"-state.
When performing the migration on multiple tables, I always see it hang on the first "large" table. When I iteratively remove those large tables, it hangs on the next, etc.
The "large" tables have rows like 11 Mio, 8 Mio, 60 Mio, 27 Mio.
Tables which don't exhibit this problem are "smaller", like 4 Mio, 1 Mio, etc.
Also worth mentioning explicitly: when pgloader is hanging and I stop pgloader with ^C
, it seems the table itself is properly imported.
I see the same problem when loading a public dataset: RxNorm. The zip file contains mysql load scripts and a wrapper shell script. To support faster writes through COPY
, I use the following settings:
work_mem = 256MB
maintenance_work_mem = 1GB
The largest table is copied succesfully, but a smaller one rxnsty
hangs:
SELECT pid, age(query_start, clock_timestamp()), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | query
------+------------------+-------------+--------
5326 | -20:16:32.850028 | terminology | COMMIT
(1 row)
Edit: Loading the problematic table individually using:
LOAD DATABASE
FROM mysql://root:password@localhost:3306/rxnorm
INTO postgres://terminology:terminology@localhost:5432/rxnorm_loinc
WITH data only,
workers = 8, concurrency = 1,
batch rows = 1000,
include no drop,
create no tables
SET maintenance_work_mem to '128MB',
work_mem to '12MB'
INCLUDING ONLY TABLE NAMES MATCHING 'RXNSTY';
worked out quite well, but I still don't know where the problem was.
My config:
Last few lines in log fie:
Its strange it is stuck at such a round number for 2 hours. Do I need to tweak some params? The total size of my mysql db is 20GB...