dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.19k stars 530 forks source link

guidance on how to setup the batch rows and memory parameters is needed in the docs #1151

Open dataf3l opened 4 years ago

dataf3l commented 4 years ago

The tool has a memory leak, or something causing heap to be exhausted. I have no idea how to fix this, but it works "sometimes". and sometimes it doesn't.

  1. show what you did,

tried mysql2psql

  1. show the result you got,

it fails, sometimes, Heap problems.

  1. explain how the result is not what you expected.

I expected it to just work.

yes, because the ubuntu 18.04 is outdated, I wish it were updated.

yeah

I just want better documentation on the way the batch rows memory work memory WITH and SET statements work together, I have no idea if I want 50MB or 5MB and would appreciate some docs on the matter.

I'm running this:

load database
    from mysql://aaa:bbb@127.0.0.1/ccc
    into pgsql://dddd@localhost/ziprecruiter

including only table names matching 'TABLE'

set work_mem to '1024 MB', maintenance_work_mem to '4096 MB';

-- before load do $$ drop schema if exists jobs cascade; $$;

I also tried:

set work_mem to '512 MB', maintenance_work_mem to '2048 MB'

we need like waaay more examples than we have now.

dimitri commented 4 years ago

Hi, thanks for reporting your issue here. pgloader is written in Common Lisp, where you don't manage memory in the application and instead rely on the system providing you with a Garbage Collector. In SBCL, the GC is fast but sometimes balloons and fails. In Clozure-CL the GC is much slower but never fails.

The different options that have been implemented in pgloader when trying to make a better use of the memory and cause less pressure on the GC are not a definitive stop-gap so I'm not sure how to best document them.

See https://pgloader.readthedocs.io/en/latest/pgloader.html#batch-behaviour-options for the most efficient solutions, and in particular it's best to tweak prefect rows down in case you hit SBCL infamous heap exhausted error.

Please note that both work_mem and maintenance_work_mem apply to the PostgreSQL behaviour and have no impact whatsoever on pgloader memory usage.

dataf3l commented 4 years ago

In that case I'll reduce prefetch.

"GC is fast but sometimes balloons and fails."

Man I wish this wasn't an issue.