chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/docs/en/chdb
Apache License 2.0
2.03k stars 72 forks source link

Insert with parameters #166

Closed nevinpuri closed 9 months ago

nevinpuri commented 9 months ago

Use case I want to insert data into a chdb instance by using the query function, or similar.

var = "hi"
client = Session()
client.query("CREATE TABLE test_table (id UUID, val String) Engine = MergeTree;")
client.query("INSERT INTO test_table VALUES (?)", var);

Is this possible? I'm aware I can just format my values to strings and concatenate them with the query. However, that's prone to sql injection attacks.

lmangani commented 9 months ago

You should be able to perform INSERT INTO from any defined dataframe passed to the query function.

tbl = cdf.Table(dataframe=pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}))
ret_tbl = tbl.query('INSERT INTO somewhere SELECT * FROM __table__')
nevinpuri commented 9 months ago

Thank you, this is a really smart solution.

nevinpuri commented 9 months ago

It seems I can't access other databases from the table query function. For example, the following code fails with Code: 81. DB::Exception: Database e does not exist. (UNKNOWN_DATABASE)

from chdb.session import Session
import chdb.dataframe as cdf
import pandas as pd

client = Session()
client.query("CREATE DATABASE e ENGINE = Atomic;")
res = client.query(
    "CREATE TABLE e.hi (a String primary key, b Int32) Engine = MergeTree ORDER BY a;"
)
df = pd.DataFrame({"a": [str("hi")], "b": [32]})
tbl = cdf.Table(dataframe=df)
h = tbl.query("INSERT INTO e.hi SELECT a, b FROM __table__")
sys.exit(0)

Do you have any ideas how I could access the database from the table query function?

nevinpuri commented 9 months ago

I've managed to make this work using dbapi.

conn = dbapi.connect()
cur = conn.cursor()

cur.execute("CREATE DATABASE e ENGINE = Atomic;")
cur.execute("CREATE TABLE e.hi (a String primary key, b Int32) Engine = MergeTree ORDER BY a;")

cur.execute("INSERT INTO e.hi (a, b) VALUES (%s, %s);", ["test", 32]) # only use %s, not %i or %d etc

cur.execute("SELECT * FROM e.hi;")
res = cur.fetchall()
# res should be [("test", 32)]

With my pull request, you will be able to pass a path into the dbapi.connect() function like so: dbapi.connect(path="./folder"). This will enable the same functionality as session, but with using the dbapi.