chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/chdb
Apache License 2.0
2.17k stars 75 forks source link

Reimplement the session mode #197

Open auxten opened 9 months ago

auxten commented 9 months ago

Current chDB(till v2.0.2) relys on temp disk storage to keep the session data. Everytime session.query runs almost everything in memory will be recreated and reinit which caused a lot of state problems like:

Also some feature implemented and bugs walked around before also need a better way to fix:

Some chDB contributor also gave a try to make session better:

Originally posted by **l1t1** February 7, 2024 ```sql :) create table a engine=Memory as select 1 a; 0.11995077133178711 :) select * from a; Code: 60. DB::Exception: Table _local.a does not exist. (UNKNOWN_TABLE) 0.11473512649536133 ``` Here is how clickhouse-local interactive mode works: ``` root@0a8b55995b6e:/auxten/chdb/tests# ./ch24.5/usr/bin/clickhouse ClickHouse local version 24.5.1.1763 (official build). 0a8b55995b6e :) create table a engine=Memory as select 1 a; CREATE TABLE a ENGINE = Memory AS SELECT 1 AS a Query id: 967a5d72-bb39-4a42-8a11-a108eda2a5d9 Ok. 0 rows in set. Elapsed: 0.008 sec. 0a8b55995b6e :) select * from a; SELECT * FROM a Query id: e5be6b9b-b752-4418-8753-adb8cc69a127 ┌─a─┐ 1. │ 1 │ └───┘ 1 row in set. Elapsed: 0.008 sec. ```

The good part are:

  1. Better support for states like 'Memory Table Engine', 'UDF', 'SET', 'USE'
  2. Less tricky code to handle default database and 'SET', 'USE' statements
  3. Without load tables and do init on every query function call, Performance should be much better than current implementation
auxten commented 2 months ago

All issues with label https://github.com/chdb-io/chdb/labels/Session is related to this feature

225

258

261

https://github.com/chdb-io/chdb-node/issues/18

auxten commented 2 months ago

Here is the rough plan:

  1. Upgrade engine to 24.8 or newer, as ClickHouse engine did a lot of bugfix and optimization on clickhouse-local in recent 3 releases
  2. Better handling BackgroundSchedulePool and all kinds of Context
  3. Reimplement the session mode
ruslandoga commented 3 weeks ago

👋 @auxten

Would it also support stale parts cleanup https://github.com/chdb-io/chdb/issues/107?

And, in general, full use of MegreTrees (writing, parallel reading, etc.) https://github.com/orgs/chdb-io/discussions/210#discussioncomment-8863924

auxten commented 3 weeks ago

👋 @auxten

Would it also support stale parts cleanup https://github.com/chdb-io/chdb/issues/107?

Yes, I think so

raystyle commented 1 week ago

👋 @auxten Would it also support stale parts cleanup #107?

Yes, I think so

If it is not possible to merge partitions, each append operation will cause the partition to keep expanding.

auxten commented 1 week ago

👋 @auxten Would it also support stale parts cleanup #107?

Yes, I think so

If it is not possible to merge partitions, each append operation will cause the partition to keep expanding.

Yes, It's a serious problem in current impl. In this new impl, there will be some background threads doing the merge work periodically.