apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
381 stars 94 forks source link

c/driver/postgresql: Cursor.adbc_ingest() throws an error when an empty table is passed #2071

Closed avm19 closed 2 months ago

avm19 commented 2 months ago

What happened?

Cursor.adbc_ingest() throws an error when an empty table is passed:

ProgrammingError: INVALID_ARGUMENT: [libpq] Failed to execute COPY statement: PGRES_FATAL_ERROR ERROR: COPY file signature not recognized. SQLSTATE: 22P04

Expected behaviour: creates/replaces/appends a table with 0 rows.


I am using Python:

import adbc_driver_postgresql.dbapi
import pyarrow as pa
...

This works:

n = 10
a = pa.array(range(n), type=pa.int32())
t = pa.Table.from_arrays([a], names=["id"])

with adbc_driver_postgresql.dbapi.connect(uri=uri) as conn:
    with conn.cursor() as cursor:
        cursor.adbc_ingest(table_name="test_table", data=t, mode="replace", temporary=True)  # success

After changing n=10 to n=0 to create a table with 0 rows, the same code fails:

n = 10
a = pa.array(range(n), type=pa.int32())
t = pa.Table.from_arrays([a], names=["id"])

with adbc_driver_postgresql.dbapi.connect(uri=uri) as conn:
    with conn.cursor() as cursor:
        cursor.adbc_ingest(table_name="test_table", data=t, mode="replace", temporary=True)  # FAILS: ProgrammingError

A quick check showed that we get the same error irrespectively of mode and temporary arguments.

Stack Trace

No response

How can we reproduce the bug?

No response

Environment/Setup

Python 3.11, installed via pip:

pip list | grep adbc
adbc-driver-manager       1.1.0
adbc-driver-postgresql    1.1.0
adbc-driver-sqlite        1.1.0
paleolimbot commented 2 months ago

Thanks for reporting!

This seems to work with arrays that have zero items (but not stream with zero arrays):

library(adbcdrivermanager)
#> Warning: package 'adbcdrivermanager' was built under R version 4.3.3

con <- adbc_database_init(
  adbcpostgresql::adbcpostgresql(), 
  uri = "postgresql://localhost:5432/postgres?user=postgres&password=password"
) |> 
  adbc_connection_init()

con |> 
  execute_adbc("DROP TABLE IF EXISTS no_integers")

# Works (stream with one zero-size array)
data.frame(x = integer()) |> 
  write_adbc(con, "no_integers")

# Fails (stream with zero arrays)
nanoarrow::basic_array_stream(
  list(), 
  nanoarrow::na_struct(list(x = nanoarrow::na_int32()))
) |> 
  write_adbc(con, "no_integers")
#> Error in adbc_statement_execute_query(stmt): INVALID_ARGUMENT: [libpq] Failed to create table: ERROR:  relation "no_integers" already exists
#> 
#> Query was: CREATE TABLE "public" . "no_integers" ("x" INTEGER)

Created on 2024-08-11 with reprex v2.1.0