dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.45k stars 548 forks source link

DIVISION-BY-ZERO problem on Ubuntu #1524

Open vpokornic opened 1 year ago

vpokornic commented 1 year ago

Hi,

First of all, thank you for the great tool.

The case that I'm using a tool for is to load multiple CSV files into PostgreSQL.

Operating system: Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-76-generic x86_64) Installed with command: sudo apt-get install pgloader Installed version: pgloader/jammy-pgdg,now 3.6.9-1.pgdg22.04+1 amd64

  1. Load configuration file - without limits
load csv
     from all filenames matching ~<csv$>
          in directory '/[dir_path]/'
          having fields (fields_list)

     into postgresql://connection_string
          target table schema_name.table_name
          target columns (colunmns_list)

     with truncate,
          skip header=1,
          fields optionally enclosed by '"',
          fields escaped by double-quote,
          fields terminated by ','
;

Result: Heap exhausted....

  1. Load configuration file - adding up batch rows and prefetch rows limits Result:
    
    FATAL: Failed to start the monitor thread.

arithmetic error DIVISION-BY-ZERO signalled Operation was (/ 1.2550987d7 0.0d0).

3. Load configuration file - playing with number of workers and concurrency
Only time when I don't get "DIVISION-BY-ZERO" is when I set worker to 1 and concurrency to 1.

I tried with docker images pgloader:latest and pgloader:ccl.latest - same.
I tried to build myself pgloader from the source, and changing DYNSIZE - same result.
I tried to change on older version through the apt-get - same result.

Can you please tell me how can I influence on the heap size, and how to fix DIVISION-BY-ZERO problem?

Please provide the following information:

pgloader --version:

pgloader version "3.6.999791d" compiled with SBCL 2.1.11.debian

did you test a fresh compile from the source tree?
`yes - same result`

did you search for other similar issues?
`yes, but there are no helpful answers.`

how can I reproduce the bug?
`try run load configuration on multiple CSVs`

load configuration:

load csv from all filenames matching ~<csv$> in directory '/[dir_path]/' having fields (fields_list)

 into postgresql://connection_string
      target table schema_name.table_name
      target columns (colunmns_list)

 with truncate,
      skip header=1,
      fields optionally enclosed by '"',
      fields escaped by double-quote,
      fields terminated by ','

;


pgloader output you obtain:

FATAL: Failed to start the monitor thread.

arithmetic error DIVISION-BY-ZERO signalled Operation was (/ 1.2550987d7 0.0d0).


data that is being loaded, if relevant:
`no`

Thank you for the help!