[Draft] Proposal for SQLAlchemy Data Layer with ORM

qtangs commented 1 month ago

This is a proposal for a different way to implement SQLAlchemy Data Layer using ORM instead of raw SQL statements, taking inspiration from https://github.com/Chainlit/chainlit/pull/832 Hopefully it will simplify support for other SQL dialects besides Postgres and SQLite.

This is a rather major change:

[x] Define tables and indices using SQLAlchemy objects
[x] Use SQLAlchemy functions to create statements instead of raw SQL queries

Other changes:

[x] in get_thread and list_threads. only get specific thread(s) for better performance and reduced bandwidth compared to get_all_user_threads (note: reduced bandwidth is important for providers such as Supabase)
[x] in list_threads, filter would use thread name as well as steps input/output texts, this needs careful review.
[x] in create_element, storage_provider is made optional so that elements that don't need content to be stored will still be added to database, example of these elements are external links that can be read and displayed at runtime.

The diff between current SQLAlchemy and this one can be found here: https://gist.github.com/qtangs/71370e6ce5feec78d9586c808804e81c/revisions?diff=split&w=

Some tests have been added based on https://github.com/Chainlit/chainlit/blob/main/backend/tests/data/test_sql_alchemy.py but more might be needed.

dokterbob commented 1 month ago

@qtangs Woop woop! :D :D

One important aspect is that we don't break existing applications.

Do you think we can reasonably assure that? Or should we deprecate older installations?

I expect to review this in the 2nd half of this week. Want to take my time to do it well, as it's such a big one.

dokterbob commented 1 month ago

Might want to take this into account #1368

qtangs commented 1 month ago

Do you think we can reasonably assure that? Or should we deprecate older installations?

@dokterbob I think more testing is needed if we want to replace the old version in-place. I've added a new class to avoid this kind of major impact, anyone needing the new fix can switch to new class in their environment and test accordingly, if something breaks it's easy to switch back to old class.

dokterbob commented 1 month ago

Do you think we can reasonably assure that? Or should we deprecate older installations?

@dokterbob I think more testing is needed if we want to replace the old version in-place. I've added a new class to avoid this kind of major impact, anyone needing the new fix can switch to new class in their environment and test accordingly, if something breaks it's easy to switch back to old class.

We're not gonna support both though, and it seems there's a reasonable user base for the current implementation. I imagine we could deprecate the old version. Or we could use unit tests to somewhat assure consistency.

Curious how other community members/users of SQLite/SQLAlchemy reflect on this.

barrel-roll-42 commented 1 month ago

@dokterbob This would totally be great for my Databricks implementation of this.

daviddwlee84 commented 1 week ago

@dokterbob Came from #832 I'm looking for this feature to preserve data locally. SQLite would be one of the easiest ways to do this.

dokterbob commented 1 week ago

https://github.com/Chainlit/chainlit/pull/1463 is the first step towards the cleanup of data layers. Once that's merged, @qtangs, could you please resolve conflicts on this one?

I want to either have this code merged to main or (preferably) move it directly to our community repo!

Chainlit / chainlit

[Draft] Proposal for SQLAlchemy Data Layer with ORM #1365