chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/chdb
Apache License 2.0
2.13k stars 75 forks source link

Does chdb supports orm? If supports, how to use? #213

Open flyly0755 opened 6 months ago

flyly0755 commented 6 months ago

Use orm, we doesn't need to write raw sql to do crud operation, instead of using sqlalchemy package to deal with database. Usually, with one orm class to map one database table. for example as below:

from sqlalchemy import Column
from clickhouse_sqlalchemy import engines
from clickhouse_sqlalchemy.ext.declarative import declarative_base
from clickhouse_sqlalchemy.types import String, UInt32

ChBase = declarative_base()

class FileInfo(ChBase):
    __tablename__ = 'FileInfo'
    FILE_ID = Column(UInt32, primary_key=True)
    filename = Column(String)
    filepath = Column(String)
    filemd5 = Column(String(32))
    __table_args__ = (
        engines.MergeTree(order_by=('FILE_ID',),
                          primary_key=('FILE_ID',)),
        {'comment': 'FileInfo Table in clickhouse'}
    )

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

ckuser = 'ckuser'
ckpwd = 'ckpwd'
ckip = 'ckip'
ckport = '8123'
ckdbname = 'testdb'
uri = f"clickhouse://{ckuser}:{ckpwd}@{ckip}:{ckport}/{ckdbname}"
engine = create_engine(uri, echo=False)
session = sessionmaker(bind=engine)()
session.execute('SELECT 1')
FileInfo.__table__.create(engine)  # create table FileInfo

session.query(FileInfo).filter_by(FILE_ID=1).all()  # equal to sql: select * from FileInfo where FILE_ID=1

chdb is a superb package, very useful for testing and debug👍👍👍 But all code related with clickhouse uses orm in my softwore project, so I want to whether chdb supports orm? uri is a very important parameter. If supports, how to set uri? something like "clickhouse+chdb://xxx"?

auxten commented 6 months ago

chDB do have Python DB-API support. See https://github.com/chdb-io/chdb/blob/main/examples/dbapi.py I didn't try it with SQLAlchemy. You can give it a try. If any problem, please tell me here or on discord.

flyly0755 commented 6 months ago

seems not support, coz uri is a fundamental parameter. with uri(user:password@dbip:port/dbname), client side can connect with database server. But with chdb, all things is in memory, not related with network. So can't communicate with client side with uri except extra dialect supported by sqlalchemy. for example, sqlite support memory mode, similar to chdb. We can integrate sqlite with sqlalchemy with uri

from sqlalchemy import create_engine

# format 1, disk mode, uri = sqlite:///home/stephen/db1.db
engine = create_engine('sqlite:///home/stephen/db1.db')
# format 2,  memory mode, uri = sqlite:///:memory:
engine = create_engine('sqlite:///:memory:')

session = sessionmaker(bind=engine)()

if chdb wants to support sqlalchemy, first need to negotiate one recognized uri, for example chdb:///:memory:

flyly0755 commented 6 months ago

@auxten I also ask sqlalchemy for this question ^_^ Need extra database dialect code development to support this function. https://github.com/sqlalchemy/sqlalchemy/discussions/11319 chdb and sqlalchemy both sides need to negotiate this work.

auxten commented 6 months ago

Thank you! I will look into this tomorrow. BTW, are you interested to contribute this for chDB?😉

flyly0755 commented 6 months ago

Yes, I am interested😊, but I am not familar with lower level logics of sqlalchemy dialect. Maybe in the future after I understand these base knowledge, I can make a help. By the way, I have found the link of supported sqlalchemy dialect https://docs.sqlalchemy.org/en/14/dialects/

rominf commented 2 months ago

I spent a few hours to prepare POC: https://github.com/rominf/clickhouse-sqlalchemy/tree/rominf-chdb. It is still on early stages (not production ready), but kinda works for poking. I had to make some workarounds for known issues: absence of persistence between calls for Memory engine and default database: https://github.com/chdb-io/chdb/issues/262#issuecomment-2325570573 and https://github.com/chdb-io/chdb/issues/225#issuecomment-2154217779. Also, the conversion for complex types does not work (this contributes to most test failures). Currently, 25/177 tests of testing suite testing chDB driver FAIL (in other words, 152/177 PASS).

To play with it, install it as usual from the branch above and then write something like:

from sqlalchemy import create_engine, Column, MetaData

from clickhouse_sqlalchemy import (
    Table, make_session, get_declarative_base, types, engines
)

uri = 'clickhouse+chdb:///test?path=/tmp/chdb-test'

engine = create_engine(uri)
session = make_session(engine)
metadata = MetaData()

Base = get_declarative_base(metadata=metadata)

class Rate(Base):
    day = Column(types.Date, primary_key=True)
    value = Column(types.Int32)
    __table_args__ = (
        engines.Log(),
    )

metadata.create_all(bind=engine)
from datetime import date, timedelta

from sqlalchemy import func

today = date.today()
rates = [
    {'day': today - timedelta(i), 'value': 200 - i}
    for i in range(100)
]
session.execute(Rate.__table__.insert(), rates)
session.query(Rate).count()  # Outputs 100

Feel free to borrow my code/ideas/contact me for working together/etc. to get this driver fully working!