Closed djouallah closed 9 months ago
Hello @djouallah
This is a moving target as we expand chdb feature, but here are a few ways off the top of my head.
chdb statements are multiquery, so if you logic allows you can execute your statements sequentially:
chdb.query("create or replace view xxx as select 42; select from xxx;")
You can use temporary dataframe tables to exchange data between calls using the virtual __table__
selector:
import chdb.dataframe as cdf
import pandas as pd
tbl = cdf.Table(dataframe=pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}))
ret_tbl = tbl.query('select * from __table__')
print(ret_tbl)
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
Last but not least, you can pipe data in/out of chdb executions using stdin/out (from the script or anything else)
import sys
import pandas as pd
import pyarrow as pa
import chdb
import subprocess
df = pd.DataFrame({'a': [2, 2], 'b': [4, 4]})
# convert Dataframe to Arrow stream
data = pa.BufferOutputStream()
pa.RecordBatchStreamWriter(data, pa.Table.from_pandas(df).schema).write_table(pa.Table.from_pandas(df))
data = data.getvalue()
query = '''
CREATE TABLE table ENGINE = File(ArrowStream, stdin); SELECT a + b as sum FROM table;
'''
# Spawn the subprocess and pass custom input from the parent process
def spawn_chdb(query, stdin, format):
proc = subprocess.Popen(['python3', '-m', 'chdb', query, format], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
custom_input = data
stdout = proc.communicate(input=stdin)[0].decode('utf-8')
return stdout
# Call the function to spawn the subprocess with custom input
res = spawn_chdb(query, data, "Dataframe")
# Run the query
print(res)
Hope this helps. If I missed any other way, the team will fill in :)
thanks, but that does not replace the functionality of a view, and please don't tag is as a question, it is a feature request.
can we have something like this
that's a question right there 😄 if its a feature request, please add it to the title anytime.
interesting, it seems to be working, at least i don't get errors but no result :) https://colab.research.google.com/drive/1o07UEUUAxcdR9WCdYt53mVZBvqqbqzxn#scrollTo=MDTt42NXEL5_
@djouallah There is an issue, the chdb default db is not saved in the session(default behavior in clickhouse-local). If you want to preserve it, you can create a new db and create table, view in the new db.
It might be wired for now, I can fix that in the v0.12 release. Any suggestion?
ideally something like this will be great, I prefer not to hard code the db to the table name, and show method will be really nice for python compatibility
from chdb import session as chs
sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic")
sess.query(f'create view db_xxx.lineitem as select * from file("./{sf}/lineitem/*.parquet", Parquet)')
sess.query("use db_xxx")
sess.query("SELECT count(*) FROM lineitem").show()
Great suggestion about use db_xxx
.
But why ret.show()
for the result, similar to print(ret)
?
Any reference who supports this?
Spark, duckdb, datafusion, it is not a very big deal , just a personal preference, don't worry about :)
use db_xxx. will be really nice for SQL compatibility
using 0.14.1, still does not works :(
@djouallah this release was focused on UDF support, but don't worry use
support is still on the way!
can we have something like this
chdb.query("create or replace view xxx as select 42")
chdb.query(" select from xxx")