Chainlit / chainlit

Build Conversational AI in minutes ⚡️
https://docs.chainlit.io
Apache License 2.0
6.91k stars 909 forks source link

Create an open source data layer #793

Open tpatel opened 7 months ago

tpatel commented 7 months ago

Chainlit has a feature that enables you to store, analyze and persist your data with Literal. You can also create your own custom data layer to store the data in your database.

The goal of this ticket is to create an implementation of a custom data layer using a database like postgres or redis. You can see an example of a custom data layer in this test, although it doesn't connect to a database.

hayescode commented 7 months ago

Only existing version (using chainlit version 0.7.0) can be found here

Will Chainlit Devs assist in this effort or will this be 100% community-driven?

tpatel commented 7 months ago

@hayescode thanks for sharing this repo. They decided to rebuild a graphql API. With the custom data layer feature, you should be able to store data in any database without having to build this graphql API, thanks to the interface.

A custom data layer needs:

  1. One class that inherit from chainlit.data.BaseDataLayer
  2. In your chainlit app, you need:
import chainlit.data as cl_data

cl_data._data_layer = MyDataLayer()

This removes the need to stay compatible with the graphql API, the need to maintain a server reachable via your chainlit app. And it enables a direct connection to your DB.

I'm looking into other chainlit tasks at the moment, but I'll be around to review PRs and help!

AndreasMarcec commented 7 months ago

I'm currently working on my own Data Layer which is based on the BaseDataLayer. Could you please specify which of the features would be mandatory in order to create a valid PR?

Rajatkhanna801 commented 6 months ago

Hi I want to add MSAL authentication in custom layer. Can anyone help me with that.

sandangel commented 6 months ago

Hi, I'm implementing this PR: https://github.com/Chainlit/chainlit/pull/796 Is it possible to share where do we create_thread? I could not find it in the code.

tpatel commented 6 months ago

I'm currently working on my own Data Layer which is based on the BaseDataLayer. Could you please specify which of the features would be mandatory in order to create a valid PR?

@AndreasMarcec The best would be a complete implementation that overrides all methods from the BaseDataLayer (https://github.com/Chainlit/chainlit/blob/main/backend/chainlit/data/__init__.py#L53). You could start with a partial implementation though, and get help if you're stuck on anything.

tpatel commented 6 months ago

Hi I want to add MSAL authentication in custom layer. Can anyone help me with that.

@Rajatkhanna801 the best would be to start with the Authentication callback. This is a configuration on a per-app basis. No need for custom data layer if you just need MSAL Authentication. I'm not familiar with MSAL, feel free to create another issue if you encounter any issue.

Rajatkhanna801 commented 6 months ago

@tpatel I have already done with MSAL authentication and it is working preety good. Now I am creating custom layer to add data in SQLite database.

Rajatkhanna801 commented 6 months ago

@tpatel I need one help the SQLite database needs to create a user table is there is already predefined model structure for user table?

hayescode commented 6 months ago

Hi, I'm implementing this PR: https://github.com/Chainlit/chainlit/pull/796 Is it possible to share where do we create_thread? I could not find it in the code.

@sandangel the update_thread function is an upsert. I agree it's weird everything else has a create/update/delete but not for threads..

hayescode commented 6 months ago

@tpatel if you could share the DDL for the backend tables I think it would speed up each of our developments. Thanks!

sandangel commented 6 months ago

@hayescode I updated the code to mimic literalai client instead. It's working now, just need a few update on filter.

hayescode commented 6 months ago

@sandangel why did you do that?

sandangel commented 6 months ago

@hayescode I explained in the PR.

hayescode commented 6 months ago

@willydouhard @tpatel @constantinidan implementing a custom data layer is proving difficult with the intertwining of literalai in chainlit/data. Literalai is effectively a dependency as-is even if literalai isn't used and will make long-term support more difficult. For example chainlit expects pagination, thread filters, etc. types from literalai in order to work. These would ideally be in Chainlit.

Will chainlit be refactored to natively support Chainlit functionality?

hayescode commented 6 months ago

I just opened a PR to add a Postgres custom data layer with ADLS support -> #825

@tpatel @willydouhard

tjroamer commented 6 months ago

Just opened a PR to add a simple file-based SQLite data layer -> #832

No need to set up extra database. By default, the data is persisted in chainlit.db in working dir. The user can use any SQLite database tool to view the DB.

@tpatel @willydouhard

hayescode commented 6 months ago

@tpatel @willydouhard Here's my PR for adding a dialect agnostic SQLAlchemy custom data layer. We've been getting more community contributions on this lately, can we move this to 'In Progress' or 'In Review'?

https://github.com/Chainlit/chainlit/pull/836

wfjt commented 4 months ago

Something I'd like to raise is the coupling of session handling. If Chainlit used a session service/adapter interface and didn't spread session handling everywhere and force a stateful architecture, one could implement a Redis session adapter for example and run Chainlit on k8s like a normal stateless service. A backend should NOT be a monolithic stateful in-memory-state-keeping system. It's fine to have it as a default for simple use-cases, but should not prevent plugging a more conventional session management system in.

The whole notion of a data layer without control over sessions leaves it coupled to the in-memory stateful architecture. I should be able to run parallel backends without sticky sessions. I should be able to run Chainlit backend on spot compute and simply drain and fail-over as needed without impacting user experience.

This issue alone made me drop Chainlit from the short-list. I can't run it in production with this sort of software architecture, if you can even call it that.

nileshtrivedi commented 4 months ago

I posted this in the discord server as well.

Can the CustomDataLayer support user attributes (such as role/team/organization_id) so that the tools or prompts available to the AI chat can be customized for each user? This is a crucial requirement if Chainlit is to be used in SaaS apps with multiple isolated customers.

hayescode commented 4 months ago

@nileshtrivedi yes, in the user.metadata field.