MagicStack / asyncpg

A fast PostgreSQL Database Client Library for Python/asyncio.
Apache License 2.0
6.99k stars 404 forks source link

Support for `WHERE` clause when usng `copy_to_table` method #939

Open redgoldlace opened 2 years ago

redgoldlace commented 2 years ago

Hi there!

Since version 12, PostgreSQL has supported a WHERE clause within COPY ... FROM statements:

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

It functions similarly to the WHERE clause used within SELECT statements, subject to some restrictions regarding subqueries.

Recently I've found myself in a situation where I am performing a bulk insert of CSV-formatted data from a remote machine, but would like to skip certain rows - such as those containing null values. From what I can tell, the existing Connection.copy_to_table helper doesn't support this, and I'd need to drop down to private API (the underlying Connection._protocol._copy_in machinery)

I'd appreciate some publicly exposed way of doing this - the simplest approach that comes to mind would be adding a where kwarg to Connection.copy_to_table as well as the Connection._format_copy_opts method. Usage could then look something along the lines of

await conn.copy_to_table("cool_table", source="important_file.csv", where="important_value = 42")

I'm not sure if this is satisfactory, though. There would also need to be some consideration regarding earlier Postgres versions that don't support COPY ... FROM using a WHERE clause.

Please let me know your thoughts - I'm happy to open a PR for implementation if necessary.

elprans commented 2 years ago

Sure, please post a PR. We can gate the functionality behind a ServerCapabilities flag for servers that don't support this.