chdb-io / chdb

chDB is an in-process OLAP SQL Engine πŸš€ powered by ClickHouse
https://clickhouse.com/chdb
Apache License 2.0
2.13k stars 75 forks source link

Support use db_xxx in chdb session. #97

Closed auxten closed 1 year ago

auxten commented 1 year ago
from chdb import session as chs

sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic;")
sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table_xxx (x String, y Int) ENGINE = MergeTree ORDER BY x;")

sess.query("USE db_xxx;") # update current db to db_xxx

sess.query("INSERT INTO log_table_xxx (x, y) SELECT toString(rand()), rand() FROM numbers(1000000);")
sess.query("SELECT count(*) FROM log_table_xxx;")

See also: #82

ruslandoga commented 1 year ago

πŸ‘‹

I'm interested in making this work! But I'll need some pointers!

This is also a problem for me in my clickhouse-local Elixir wrapper https://github.com/ruslandoga/ch_local as I need to prefix the database name for all ops and --database flag doesn't seem to work.

lmangani commented 1 year ago

Welcome to the chdbunch @ruslandoga πŸ‘‹ If this also happens in a clickhouse-local wrapper and --database doesn't work there's a chance this is a general ClickHouse issue. Was it ever reported to CH as part of your testing @ruslandoga ?

ruslandoga commented 1 year ago

No, I haven't reported it. I'll ask in https://t.me/clickhouse_ru if my understanding of how it is supposed to work is correct and open an issue on ClickHouse :)

auxten commented 1 year ago

Basically, here's the idea:

  1. store db name in a file when exec use db
  2. change default db in later query
lmangani commented 1 year ago

This should work in clickhouse-local. Here's a starting point to crawl back from

vdimir commented 1 year ago

Indeed --database argument don't work as expected in clickhouse local (I'll check how to fix it. Probably it's not implemented because no so many database exists when clickhouse-local is just started).

However USE database does. Likely issue with --database related to handling of command line args.

Details ``` $ ~/clickhouse local --database system ClickHouse local version 23.8.1.1. :) select * from one; SELECT * FROM one Query id: 92e1ee40-7d33-4dea-bb7d-22e21f6781e8 0 rows in set. Elapsed: 0.082 sec. Received exception: Code: 60. DB::Exception: Table _local.one does not exist. (UNKNOWN_TABLE) :) Bye. $ ~/clickhouse local ClickHouse local version 23.8.1.1. :) use system; USE system Query id: 09461038-0afb-4824-ad54-31efeadc955e Ok. 0 rows in set. Elapsed: 0.000 sec. :) select * from one; SELECT * FROM one Query id: 44f452d7-31f5-4f30-b64f-7c7a800d27ce β”Œβ”€dummy─┐ β”‚ 0 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”˜ 1 row in set. Elapsed: 0.001 sec. :) select currentDatabase(); SELECT currentDatabase() Query id: a62937f0-49bd-460b-a728-d9435e4da003 β”Œβ”€currentDatabase()─┐ β”‚ system β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 1 row in set. Elapsed: 0.001 sec. :) ```

I believe issue somewhere in context usage in chs.Session(). We need to store context in there and use this context for queries in that session.

ruslandoga commented 1 year ago

I thought a bit more about this and realised that having support for --database is better for my use case, it'd be more in line with the :database option we have in the HTTP client in Elixir. I also looked into how chs.Session() works, and it seems to me it just adds a --path option to the query_stable? If so, I'm not sure how to cleanly intercept use <database> commands and where to store that context in Python, and even if it is stored, how to provide it to query_stable if --database doesn't work. So I guess the changes need to be done in C++ but that seems more complicated than making --database option work.

vdimir commented 1 year ago

clickhouse-local issue with --database argument is inside command line options parsing https://github.com/ClickHouse/ClickHouse/pull/54503

auxten commented 1 year ago

I thought a bit more about this and realised that having support for --database is better for my use case, it'd be more in line with the :database option we have in the HTTP client in Elixir. I also looked into how chs.Session() works, and it seems to me it just adds a --path option to the query_stable? If so, I'm not sure how to cleanly intercept use <database> commands and where to store that context in Python, and even if it is stored, how to provide it to query_stable if --database doesn't work. So I guess the changes need to be done in C++ but that seems more complicated than making --database option work.

There are also some code to write to store the "Using" db name in a temp file and pass it via --database. @ruslandoga are you still working on this?

ruslandoga commented 1 year ago

πŸ‘‹ @auxten

No, I think the long lived client approach works better for me :) Plus now that the --database option is fixed in the local client I can just use that similar to how it's done in the HTTP client.

auxten commented 1 year ago

Nice, I am also inspired by the long-lived client approach #108. It will be a better solution for interactive queries. I will try to:

  1. Make use db_xxx work in the old way first.
  2. And then, make a new session(stateful) query implementation based on #108.

Thanks for you guys.😺

djouallah commented 1 year ago

any update on this ?

auxten commented 1 year ago

any update on this ?

Will be ready when v0.16.0 stable version released

djouallah commented 1 year ago

would love to test the rc but no python release is available yet :(

blackrez commented 1 year ago

@djouallah you can install it by specifying the version : pip install chdb==0.16.0rc2

auxten commented 1 year ago

@djouallah you can install it by specifying the version : pip install chdb==0.16.0rc2

Not this version, after we fix the build pipeline issue. I will draft a new version.

lmangani commented 1 year ago

would love to test the rc but no python release is available yet :(

here you go! pip install chdb==1.0.0rc1