chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/docs/en/chdb
Apache License 2.0
2.03k stars 72 forks source link

Zero-copy select and return for pyarrow tables #95

Open danthegoodman1 opened 1 year ago

danthegoodman1 commented 1 year ago

Being able to select pyarrow tables without copying, as well as accessing results as pyarrow tables without copying would be massively beneficial to building low-latency ETL pipelines and other data processing pipelines.

Specifically having streaming support would be massive too: https://duckdb.org/2021/12/03/duck-arrow.html#streaming-data-fromto-arrow as this would greatly reduce the required memory usage for queries and post-processing of data

auxten commented 2 months ago

Faster path of query on ArrowTable is done on v2.0.0b1 Example: https://github.com/chdb-io/chdb/blob/main/tests/test_query_py.py#L94