Open DanielMao1 opened 3 months ago
create temporary table seems works, but cannot select it.
>>> from chdb.session import Session
>>>
>>> db = Session()
>>> db.query("create database db")
>>> db.query("use db")
>>> x="create temporary table data (id UInt32, x UInt32) engine MergeTree order by id sample by id as select number+1 as id, randUniform(1, 100) as x from numbers(10000);"
>>> db.query(x)
>>> y='select avg(x) as "avg", round(quantile(0.95)(x), 2) as p95 from data sample 0.1;'>>> db.query(y)
Code: 60. DB::Exception: Table db.data does not exist. (UNKNOWN_TABLE)
>>> x="create table data (id UInt32, x UInt32) engine MergeTree order by id sample by id as select number+1 as id, randUniform(1, 100) as x from numbers(10000);"
>>> db.query(x)
>>> db.query(y) 50.2891,95
try Query on Pandas DataFrame of https://clickhouse.com/docs/en/chdb/install/python
import chdb
import pandas as pd
df = pd.DataFrame(
{
"a": [1, 2, 3, 4, 5, 6],
"b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
}
)
chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()
try Query on Pandas DataFrame of https://clickhouse.com/docs/en/chdb/install/python
import chdb import pandas as pd df = pd.DataFrame( { "a": [1, 2, 3, 4, 5, 6], "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"], } ) chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()
chdb.dataframe is going to be deprecated soon Please use the example above to handle dataframe with chDB
still can't select from table
>>>
>>> chdb.query("create table a engine=Memory as SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()
>>> chdb.query("select * from a").show() Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/chdb/__init__.py", line 78, in query
raise ChdbError(res.error_message())
chdb.ChdbError: Code: 60. DB::Exception: Unknown table expression identifier 'a' in scope SELECT * FROM a. (UNKNOWN_TABLE)
when use session, the result is empty.
>>> from chdb.session import Session
>>> db = Session()
>>> db.query("create database db")
>>> db.query("use db")
>>> import chdb
>>> import pandas as pd
>>> df = pd.DataFrame(
... {
... "a": [1, 2, 3, 4, 5, 6],
... "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
... }
... )
>>> db.query("create table a engine=Memory as SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()
>>> db.query("select * from a").show()
engine=Log works. engine=Memory did not insert rows
>>> db.query("select count() from a")
0
>>> db.query("create table a1 engine=Log as SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()
>>> db.query("select count() from a1") 3
>>> db.query("select * from a1") "auxten",9
"jerry",7
"tom",5
Hello! : ) I am working on some feature engineering tasks and I find your great job. But I encountered some API limitations. Such APIs are supported by clickhouse and duckdb, but not supported by chDB. So I would like to discuss this issue.
Use case
When I using clickhouse, my use case is like:
Describe the solution you'd like
*In chDB, I tried the API for a whole day but I can not find any APIs like clickhouse's
insert_df
nor any APIs like duckdb's `duckdb.sql("INSERT INTO my_table BY NAME SELECT FROM my_df")`. I woule like such APIs so that I can import my data into servers, and reuse my analysis query.** I do know if it is easy to wrap up on clickhouse c++ library or it is to be implemented in c++. (By the way, the clickhouse insert df is not efficient, that's the reason I try chdb)Describe alternatives you've considered
I looked the official example do the sql look like this, but the problem is my query may consists of create temp table statement, which cannot be done by
chdb.dataframe
(It can only query but not create). Instead, I would like to insert the dataframe into db (logically in a session), and in this session I can do many SQLs.Additional context
Look forward to you reply : )