chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/docs/en/chdb
Apache License 2.0
2.03k stars 72 forks source link

Example on how to query pyarrow table #93

Closed danthegoodman1 closed 1 month ago

danthegoodman1 commented 1 year ago

The example for section Query On Table (Pandas DataFrame, Parquet file/bytes, Arrow bytes) on the readme has a dataframe example, but not pyarrow.

Also, question: is it zero-copy select?

danthegoodman1 commented 1 year ago
In [19]: import chdb.dataframe as cdf

In [20]: tbl = cdf.Table(arrow_table=arw)

In [21]: ret_tbl = tby.query('select * from __table__')

In [22]: print(ret_tbl)
   count()
0  3231245

Seems it's that easy, the chdb.dataframe naming is a bit confusing

danthegoodman1 commented 1 year ago

Bonus points for showing how to make it a virtual table with another name:

ret_tbl = cdf.query(sql='select * from __tb1__', tb1=cdf.Table(arrow_table=arw))
auxten commented 5 months ago

I will try to make chdb.dataframe just in chdb package. This way we could also query dataframe in chdb Session.

auxten commented 1 month ago

New chDB 2.0 API of query on ArrowTalbe: https://github.com/chdb-io/chdb/blob/d990ff80a54f5bf60ecea94d8cff8ec1f12c1d94/tests/test_query_py.py#L94-L137