citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.43k stars 662 forks source link

PostgreSQL extension that can skip faulty lines in COPY #35

Open ozgune opened 8 years ago

ozgune commented 8 years ago

When users load large data sets (from S3 or files), these datasets might have a few bad records. Most data warehousing solutions can be configured to skip over a predefined number of bad lines.

This has also been discussed for PostgreSQL: https://wiki.postgresql.org/wiki/Error_logging_in_COPY

This task proposes to extend COPY to skip over a configurable number of records.

ozgune commented 8 years ago

I'm adding a note here from our old issue tracker.

"Add a configuration option to skip over a certain number of malformed lines when uploading data. I quite like how the nzload command has this option."

marcocitus commented 8 years ago

Note: this can probably be achieved with putting a PG_TRY and PG_CATCH around just NextCopyFrom in the COPY implementation.