dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.41k stars 545 forks source link

pgloader 3.6.7 doesn't continue on fatal errors (ex. non whitespace after quoted data) contrary to pgloader 3.4.1 #1604

Open Kamal-learner-24 opened 2 months ago

Kamal-learner-24 commented 2 months ago

Hello every one,

Let me explain our problem

Recently, we migrate our solution from Redhat 7.5 with PostgreSQL 9.6.9 and pgloader 3.4.1 to Rocky Linux 8.9 with PostgreSQL 13.14 and pgloader 3.6.7

In the old system (Redhat 7.5 with PostgreSQL 9.6.9 and pgloader 3.4.1), when I try to load a CSV file having 478 894 lignes (14 lines having errors ), with a .LOAD command, pgloader 3.6.7 loads 478 880 lines. pgloader 3.4.1 runs as expected and continues loading when encoutring these errors.

In the new system (Rocky Linux 8.9 with PostgreSQL 13.14 and pgloader 3.6.7), when I try to load the same CSV file, with the same .LOAD command, pgloader 3.6.7 loads only 183 569 lines. pgloader 3.6.7 doesn't run as expected and seems to stop loading when encoutring these errors.

Here is the .LOAD command: LOAD CSV FROM /inputs/data/F024 WITH ENCODING UTF8 ( user_id [null if blanks], user_name_first [null if blanks], user_name_last [null if blanks] ) INTO postgresql:///db_rec_dv?cpy.cpy_cso_user_base(user_id, user_name_first, user_name_last) WITH truncate , fields optionally enclosed by '"' , fields terminated by ',' , prefetch rows = 50000 SET client_encoding to 'utf8' ,work_mem to '512MB' ,standard_conforming_strings to 'on' ;

Here is the error I get : 2024-08-08T13:47:09.233005+01:00 ERROR non whitespace after quoted data #<CSV-READER LINE-IDX:2 CHARACTER-LINE-IDX:22 CHARACTER-IDX:793 "byER6Vvdtb," {1005C0E263}> b 2024-08-08T13:47:09.233005+01:00 FATAL non whitespace after quoted data #<CSV-READER LINE-IDX:2 CHARACTER-LINE-IDX:22 CHARACTER-IDX:793 "byER6Vvdtb," {1005C0E263}> b

Here is the extract of the line on error (missing double quotes): "11","Colyneߌڢ,"Test"

Thank you for your help

Best regards,

Kamal

svantevonerichsen6906 commented 2 months ago

Yes, sorry, but I think you are relying on buggy behaviour, where the bug in question has been fixed six years ago. I'd propose fixing the data errors in the csv files. If I read that right, the csv-reader tells you the faulty lines (LINE-IDX).