I'm creating a pgloader command file to import a set of CSV files supplied by another team. Loading well formed files woks fine, but our input files might be badly formatted. Ideally I'd like to skip and log all bad rows in my input files; the documentation says:
To enable retrying the data and loading the good parts, use the option on error resume next, which is the default to file based data loads (such as CSV, IXF or DBF).
but I'm not quite able to get it to work.
This is my command file:
LOAD CSV
FROM
ALL FILENAMES MATCHING ~/{{{FILE_REGEXP}}}/
IN DIRECTORY '{{STAGING_PATH}}'
WITH ENCODING UTF8
INTO postgresql://{{PGUSER}}:{{PGPASSWORD}}@{{PGHOST}}/{{PGDATABASE}}
TARGET TABLE {{TABLE_NAME}}
WITH
csv header,
on error resume next,
fields terminated by '|',
truncate
;
I have a couple of files where I inserted errors on purpose for testing purposes. Here's a part of the first:
At the first error, regardless of whether I use on error resume next or on error stop, pgloader stops processing the current file and moves on to the next. For example I never see Carl Cook and Gareth Gatsby in my database.
Pgloader creates a directory named after my target table in the output directory, but there is nothing in it, although the documentation says:
At the end of a load containing rejected rows, you will find two files in the root-dir location, under a directory named the same as the target database of your setup. The filenames are the target table, and their extensions are .dat for the rejected data and .log for the file containing the full PostgreSQL client side logs about the rejected data.
I also tried adding batch rows = 1 in the hopes pgloader would move on to the next batch of one row, but there was no change in the output.
I'm probably doing something wrong. What can I check?
I'm creating a pgloader command file to import a set of CSV files supplied by another team. Loading well formed files woks fine, but our input files might be badly formatted. Ideally I'd like to skip and log all bad rows in my input files; the documentation says:
but I'm not quite able to get it to work.
This is my command file:
I have a couple of files where I inserted errors on purpose for testing purposes. Here's a part of the first:
And a part of the second:
At the first error, regardless of whether I use
on error resume next
oron error stop
,pgloader
stops processing the current file and moves on to the next. For example I never see Carl Cook and Gareth Gatsby in my database.Pgloader creates a directory named after my target table in the output directory, but there is nothing in it, although the documentation says:
I also tried adding
batch rows = 1
in the hopes pgloader would move on to the next batch of one row, but there was no change in the output.I'm probably doing something wrong. What can I check?