ClickHouse / clickhouse-connect

Python driver/sqlalchemy/superset connectors
Apache License 2.0
312 stars 64 forks source link

Segfault when inserting data with missing columns #358

Closed Ragenose closed 3 months ago

Ragenose commented 3 months ago

Describe the bug

While inserting data, if a row somehow misses a column, it will trigger a segfault from dataconv.cpython-311-x86_64-linux-gnu.so

Steps to reproduce

  1. Bring up a ClickHouse server with default settings and create a simple table.
  2. Run the script
  3. Check dmesg or output from faulthandler

Expected behaviour

Should raise clickhouse_connect.driver.exceptions.DataError instead of segfault.

Code example

import clickhouse_connect
# import faulthandler

# faulthandler.enable()

"""
Table Schema
CREATE TABLE test (
    col_1 UInt32,
    col_2 UInt32, 
) ENGINE MergeTree()
PRIMARY KEY (col_1, col_1);
"""

# Triggering Segfault
data = [
    *[[1, 0] for _ in range(2)],
    [1, ],
]
print(data)

"""
These example data will not trigger Segfault
and will raise clickhouse_connect.driver.exceptions.DataError

data = [
    *[[1, 0] for _ in range(1)],
    [1, ],
]

data = [
    [1, 0],
    [1, 0],
    [1, ]
]
"""

client = clickhouse_connect.get_client()

client.insert(
    "test",
    data,
    column_names=[
        "col_1",
        "col_1",
    ],
    column_type_names=[
        "UInt32",
        "UInt32",
    ],
)

dmesg log

[34915977.440898] python[3393460]: segfault at 0 ip 00007fb2c4f10bf6 sp 00007ffc49fbc350 error 6 in dataconv.cpython-311-x86_64-linux-gnu.so[7fb2c4eec000+2d000]
[34915977.440905] Code: 0f 84 1e 05 00 00 48 8b 74 24 10 4c 89 e7 ff d0 49 89 c6 e9 99 fd ff ff 0f 1f 40 00 49 8b 44 24 18 48 8b 54 24 48 4c 8b 34 10 <49> 83 06 01 e9 87 fd ff ff 90 4c 89 ff e8 08 b6 fd ff e9 8b fa ff

faulthandler log

Fatal Python error: Segmentation fault

Current thread 0x00007f482d696280 (most recent call first):
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/clickhouse_connect/driver/insert.py", line 144 in _row_block_data
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/clickhouse_connect/driver/insert.py", line 134 in next_block
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/clickhouse_connect/driver/transform.py", line 90 in chunk_gen
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/urllib3/connection.py", line 404 in request
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 496 in _make_request
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793 in urlopen
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/urllib3/poolmanager.py", line 444 in urlopen
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/urllib3/_request_methods.py", line 279 in request_encode_body
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/urllib3/_request_methods.py", line 144 in request
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/clickhouse_connect/driver/httpclient.py", line 418 in _raw_request
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/clickhouse_connect/driver/httpclient.py", line 256 in data_insert
  File "/home/zxing/clickhouse-test/env/lib/python3.11/site-packages/clickhouse_connect/driver/client.py", line 613 in insert
  File "/home/zxing/clickhouse-test/main.py", line 40 in <module>

Extension modules: clickhouse_connect.driverc.buffer, clickhouse_connect.driverc.dataconv, zstandard.backend_c, lz4._version, lz4.frame._frame (total: 5)
Segmentation fault (core dumped)

Configuration

Environment

ClickHouse server

genzgd commented 3 months ago

clickhouse_connect uses C optimized code for data transformation. The performance cost of checking all data sizes (especially large arrays and strings) would be significant. It is the caller's responsibility to ensure correctly sized data.