apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
385 stars 98 forks source link

Attempting to drop a Bigquery table through cur.execute throws Error about "storage read API" #2173

Open WillAyd opened 2 months ago

WillAyd commented 2 months ago

What happened?

When attempting to drop a table via cursor.execute, BigQuery throws:

OperationalError: UNKNOWN: [BigQuery] bigquery: require storage read API to be enabled

However, "bigquerystorage.googleapis.com" is enabled in my project. I even tried "cloudstorage.googleapis.com" but the error persisted

Stack Trace

OperationalError                          Traceback (most recent call last)
Cell In[14], line 15
      7 tbl = pa.Table.from_pydict({"x": [0, 1, 2], "y": [3, 4, 5]})
      9 with adbc_driver_bigquery.dbapi.connect(db_kwargs) as conn, conn.cursor() as cur:
     10     # BigQuery adbc_ingest does not work, so we use traditional
     11     # prepare / bind / execute approach
     12 
     13     # DROP TABLE might also be broken - complains about
     14     # OperationalError: UNKNOWN: [BigQuery] bigquery: require storage read API to be enabled
---> 15     cur.execute("DROP TABLE IF EXISTS demo_dataset.foo")

File ~/clones/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/dbapi.py:698, in Cursor.execute(self, operation, parameters)
    682 """
    683 Execute a query.
    684 
   (...)
    694     parameters, which will each be bound in turn).
    695 """
    696 self._prepare_execute(operation, parameters)
--> 698 handle, self._rowcount = _blocking_call(
    699     self._stmt.execute_query, (), {}, self._stmt.cancel
    700 )
    701 self._results = _RowIterator(
    702     self._stmt, _reader.AdbcRecordBatchReader._import_from_c(handle.address)
    703 )

File ~/clones/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/_lib.pyx:1569, in adbc_driver_manager._lib._blocking_call_impl()

File ~/clones/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/_lib.pyx:1562, in adbc_driver_manager._lib._blocking_call_impl()

File ~/clones/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/_lib.pyx:1213, in adbc_driver_manager._lib.AdbcStatement.execute_query()

File ~/clones/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/_lib.pyx:260, in adbc_driver_manager._lib.check_error()

OperationalError: UNKNOWN: [BigQuery] bigquery: require storage read API to be enabled

How can we reproduce the bug?

import adbc_driver_bigquery.dbapi
from adbc_driver_bigquery import DatabaseOptions
import pyarrow as pa

db_kwargs = {
    DatabaseOptions.PROJECT_ID.value: "some-demo-project-1234",
    DatabaseOptions.DATASET_ID.value: "demo_dataset",
    DatabaseOptions.TABLE_ID.value: "foo",
}

tbl = pa.Table.from_pydict({"x": [0, 1, 2], "y": [3, 4, 5]})

with adbc_driver_bigquery.dbapi.connect(db_kwargs) as conn, conn.cursor() as cur:
    # OperationalError: UNKNOWN: [BigQuery] bigquery: require storage read API to be enabled
    cur.execute("DROP TABLE IF EXISTS demo_dataset.foo")

Environment/Setup

No response

joellubi commented 2 months ago

I'm guessing we're getting an ArrowIterator() for every RowIterator returned by executing a query. There's no physical table to create a Storage Read API connection to for the DROP TABLE query, so IsAccelerated() is returning false.

We should be falling back to a non-arrow iterator in cases like this.