apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
384 stars 97 forks source link

snowflake: `adbc_ingest` will fail with "double free" segmentation fault if record batch schema is incorrect #2108

Open pkit opened 2 months ago

pkit commented 2 months ago

What happened?

If schema of RecordBatchReader doesn't match the actual batch columns - adbc driver crashes. It's also pretty hard to debug why, because the only lead is "double free or corruption (out)". Needed to run under valgrind to understand what's going on. For some reason it fails in go with a proper exception "index out of bounds" but then it's not propagated to the python code.

Stack Trace

No response

How can we reproduce the bug?

    schema = pa.schema(fields=[
        pa.field("name1", pa.string()),
        pa.field("name2", pa.string()),
    ])
    data = [
        {"name1": "aaa"},
        {"name1": "bbb"},
    ]
    reader = pa.RecordBatchReader.from_batches(schema, [pa.RecordBatch.from_pylist(data)])
    with c.cursor() as cur:
        cur.adbc_ingest("test2", reader, mode="create_append")

Environment/Setup

Latest

joellubi commented 2 months ago

@pkit Can you please share the package version(s) for which this issue occurred, and any other configuration you may have passed to the driver/connection?

I failed to reproduce this using adbc-driver-snowflake = 1.1.0. For me it failed with the stack trace I would have expected it to:

panic: arrow/array: number of columns/fields mismatch

goroutine 38 [running]:
github.com/apache/arrow/go/v17/arrow/array.NewRecord(0x1400018e480, {0x14000e82010, 0x1, 0x160009760?}, 0x2)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0-20240626234237-6680dcfbef42/arrow/array/record.go:151 +0x198
github.com/apache/arrow/go/v17/arrow/cdata.ImportCRecordBatchWithSchema(0x14000581f80?, 0x1400018e480)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0-20240626234237-6680dcfbef42/arrow/cdata/interface.go:131 +0x248
github.com/apache/arrow/go/v17/arrow/cdata.(*nativeCRecordBatchReader).next(0x14000a9a340)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0-20240626234237-6680dcfbef42/arrow/cdata/cdata.go:997 +0x1bc
github.com/apache/arrow/go/v17/arrow/cdata.(*nativeCRecordBatchReader).Next(0x14000a9a340)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0-20240626234237-6680dcfbef42/arrow/cdata/cdata.go:956 +0x20
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.readRecords({0x161bd3558, 0x140005d0aa0}, {0x108fb3898, 0x14000a9a340}, 0x14000118840)
        /Users/runner/work/arrow-adbc/arrow-adbc/adbc/go/adbc/driver/snowflake/bulk_ingestion.go:315 +0x78
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.(*statement).ingestStream.func3()
        /Users/runner/work/arrow-adbc/arrow-adbc/adbc/go/adbc/driver/snowflake/bulk_ingestion.go:249 +0x34
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /Users/runner/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x58
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 17
        /Users/runner/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x98
Abort trap: 6
pkit commented 2 months ago
$ pip freeze | grep adbc
adbc-driver-manager==1.1.0
adbc-driver-snowflake==1.1.0

I will add a full repro soon. Yes, it involves custom configuration for adbc.snowflake.statement.ingest_* stuff

pkit commented 2 months ago

I lied, it fails even with no custom config. snowflake_connector_profile.url is just a snowflake URL as a string pytest:

def test_adbc_bug(snowflake_connector_profile):
    c = connect(snowflake_connector_profile.url, db_kwargs={
        "adbc.snowflake.sql.schema": "PUBLIC",
        "adbc.snowflake.sql.db": "TEST1",
    })
    schema = pa.schema(
        fields=[
            pa.field("name1", pa.string()),
            pa.field("name2", pa.string()),
        ]
    )
    data = [
        {"name1": "aaa"},
        {"name1": "bbb"},
    ]
    reader = pa.RecordBatchReader.from_batches(schema, [pa.RecordBatch.from_pylist(data)])
    with c.cursor() as cur:
        cur.adbc_ingest("test2", reader, mode="create_append")

Exception:

=================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0 -- /home/user/adbc_bug/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/user/adbc_bug
configfile: pyproject.toml
plugins: asyncio-0.24.0, anyio-3.7.1, Faker-28.1.0
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 1 item                                                                                                                                                                                                          

tests/functional/python/test_sf_transform.py::test_adbc_bug Fatal Python error: Aborted

Thread 0x00007f3febe29740 (most recent call first):
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/adbc_driver_manager/dbapi.py", line 937 in adbc_ingest
  File "/home/user/adbc_bug/tests/functional/python/test_sf_transform.py", line 148 in test_adbc_bug
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 337 in _main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pytest/__main__.py", line 9 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, adbc_driver_manager._lib, pyarrow._compute, pyarrow._acero, pyarrow._fs, pyarrow._csv, pyarrow._json, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, adbc_driver_manager._reader, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, clickhouse_connect.driverc.buffer, clickhouse_connect.driverc.dataconv, clickhouse_connect.driverc.npconv, zstandard.backend_c, lz4._version, lz4.frame._frame, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, psycopg2._psycopg, regex._regex, _cffi_backend, charset_normalizer.md, snowflake.connector.nanoarrow_arrow_iterator (total: 54)
Aborted (core dumped)