apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
375 stars 93 forks source link

Expected PGCOPY signature of 11 bytes at beginning of stream but found -1 bytes of input #2276

Open mcrumiller opened 6 hours ago

mcrumiller commented 6 hours ago

What happened?

This is a semi-duplicate of #2133. That issue was closed by a PR that apparently improves the error message, but it seems as though it doesn't resolve the issue itself in that instance.

In my case, I'd like to diagnose what's causing the error (if possible) as I'd like to actually get this working--if there's a workaround possible on my end I would like to pursue that possibility before waiting for a new version of ADBC. I'm not sure where the ADBC log file are stored and I can't find in the documentation where I might look.

When I run my query (via polars), I get:

IO: [libpq] ReadHeader failed: Expected PGCOPY signature of 11 bytes at beginning of stream but found -1 bytes of input

If I add a LIMIT 10000 the query succeeds, so the issue is either in a later record in the data, or something else that I can't think of. I expect 1,219,228 total records.. Can someone possibly help me diagnose the issue?

Stack Trace

  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\adbc_driver_manager\_reader.pyx", line 89, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all
    return self._reader.read_all()
  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\pyarrow\ipc.pxi", line 762, in pyarrow.lib.RecordBatchReader.read_all
    check_status(self.reader.get().ToTable().Value(&table))
  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\pyarrow\error.pxi", line 92, in pyarrow.lib.check_status
    raise convert_status(status)
OSError: [libpq] ReadHeader failed: Expected PGCOPY signature of 11 bytes at beginning of stream but found -1 bytes of input

During handling of the above exception, another exception occurred:

  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\adbc_driver_manager\_reader.pyx", line 41, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
    raise exc from None
  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\adbc_driver_manager\_reader.pyx", line 91, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all
    self._helper.check_error(e)
  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\adbc_driver_manager\_lib.pyx", line 1590, in adbc_driver_manager._lib._blocking_call
    return func(*args, **kwargs)
  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\adbc_driver_manager\dbapi.py", line 1197, in fetch_arrow_table
    return _blocking_call(self._reader.read_all, (), {}, self._stmt.cancel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\project-cqn\.venv_cqn\Lib\site-packages\adbc_driver_manager\dbapi.py", line 1088, in fetch_arrow_table
    return self._results.fetch_arrow_table()

How can we reproduce the bug?

It's a fairly complex query. I could perhaps work to reproduce but since it works on 10k records, it may be difficult to make a repro.

Environment/Setup

greenplum/postgres PostgreSQL 9.4.26
(Greenplum Database 6.24.3 build commit:25d3498a400ca5230e81abb94861f23389315213)
on x86_64-unknown-linux-gnu,
compiled by gcc (GCC) 6.4.0,
64-bit compiled on May  3 2023 20:34:57
paleolimbot commented 2 hours ago

Does this still happen with the nightly Python builds? Since the last release we did a heavy refactor of some of those internals to make those types of errors report better/go away:

pip install \
      --pre \
      --index-url https://repo.fury.io/arrow-adbc-nightlies \
      adbc-driver-postgresql
mcrumiller commented 2 hours ago

Sorry, I'm unable to build the nightlies at work. Unsure if you want me to re-open after the next release if I still see the issue, or if you want to just leave this one in limbo until then.

mcrumiller commented 2 hours ago

Or actually, is there a pip release for the nightlies?

paleolimbot commented 1 hour ago

I think there's wheels! For me it resolves to https://repo.fury.io/arrow-adbc-nightlies/-/ver_1QpKJK/adbc_driver_postgresql-1.3.0-py3-none-macosx_11_0_arm64.whl but no idea if that will work for you. Installing dev versions is not great right now, sorry!

If you're up for another approach, figuring out which row/column causes the issue by manipulating the LIMIT clause might help pinpoint the issue. The error that was obscured by the last report of this was one where a computation failed on a specific value.