Chainlit / chainlit

Build Conversational AI in minutes ⚡️
https://docs.chainlit.io
Apache License 2.0
7.27k stars 958 forks source link

Replace custom (o)auth by well-supported libraries #1240

Open dokterbob opened 3 months ago

dokterbob commented 3 months ago

Is your feature request related to a problem? The current auth implementation yields a lot of issues. But auth is not by far the USP for chainlit and arguably, it will never be.

In addition, rolling your own auth yields a plethora of security risks. Security is hard. In short, unless you're a security expert, if you roll your own authn/authz, sooner later, you will fail.

Describe the solution you'd like But we don't have to fail! There's great and well supported libraries for handling (o)auth.

For one, FastAPI has built-in oauth authorization (as a server): https://fastapi.tiangolo.com/tutorial/security/ In addition, there's fastapi-oauth2 which seems to provide world-class support of a plethora of oauth providers (as a client): https://github.com/pysnippet/fastapi-oauth2

The combination should allow us to ensure:

  1. Better UX (e.g. no quirky redirects, working logout,
  2. Significantly better security (consider that we might be handling confidential conversations!).
  3. Better provider support (have a look: https://github.com/python-social-auth/social-core/tree/master/social_core/backends).
  4. Less code and so less maintenance.

Plan of action We'd have to lay out a clear roadmap, this would deprecate a lot of code and break a lot of UX. Arguably, this is the biggest challenge about such a change.

At this point, the issue merely serves as a 'test balloon'. Do maintainers/devs and community members want/need this? Is it feasible? Something I glanced over?

stephenrs commented 3 months ago

But auth is not by far the USP for chainlit and arguably, it will never be. In addition, rolling your own auth yields a plethora of security risks. Security is hard. In short, unless you're a security expert, if you roll your own authn/authz, sooner later, you will fail.

@dokterbob Just found this after writing my dissertation on #1265. I couldn't agree more.

stephenrs commented 3 months ago

I must be having a crisis of imagination because I'm still having trouble understanding why auth should be part of CL at all, So can you help me by making a case for why it should be? I'm guessing I might be missing something.

For example, who are the main target users/use cases? What kinds of things are they building and how are they using CL? Why do they want/need auth-in-a-box, particularly when spinning up a secured Flask/FastAPI/Django/etc app is so relatively easy, well-supported, and flexible?

Why shouldn't CL at the lower levels be thought of as a microservice behind an app with more "meat" and security awareness?

As you've pointed out, taking on the burden of security is not easy, so make a good case!

stephenrs commented 3 months ago

I thought it might be helpful to give you a better sense of my particular use case…CL will be fulfilling 2 roles in the domain of customer support:

  1. The full app version with chat history is already deployed internally as a trial/experiment to assist human customer support agents in crafting robust responses to customers - and they/we love it.
  2. A Copilot version will be first deployed as an in-app assistant for existing users, then if all goes well, it will be released to our public website for anyone to use.

Under the hood, 2 separate instances of CL will be running on separate ports because I’ve split the CL app target into separate modules that have slightly different behaviors but share a common core. So, for example, they both have blocks that look something like this:


@cl.on_message
async def on_message(message: cl.Message):
    await core.on_message(message)

@cl.action_callback(“followup”)
async def on_action(action: cl.Action):
    await core.on_action(action)

@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.AudioChunk):
    await core.on_audio_chunk(chunk)

@cl.on_audio_end
async def on_audio_end(elements: list[Element]):
    await core.on_audio_end(elements)

(this feels a bit clunky, but as a first pass seemed the easiest way to get separation without duplicating code while not upsetting CL, given my limited understanding of how CL manages/scopes context/session…but a fuller API [or the ability to subclass a CL class rather than rely on wrapper hooks] might be more convenient, as you mention):

The CL instances are running on an infrastructure that goes: Load Balancer -> Firewall -> Servers -> nginx -> Flask/uwsgi -> CL.

The Flask parent app is secured by Auth0.

chadhat commented 1 month ago

@stephenrs how did you integrate chainlit in a Flask app? I currently have a Flask app with authentication implemented, but I am not sure how should I seemlessly redirect to chainlit with passing some user information on successful authentication.

stephenrs commented 1 month ago

@chadhat After trying and failing to get a more elegant solution to work, I ended up displaying chainlit in an iframe inside a Flask/jinja template. It's hacky, but it works.

The Flask route is secured and looks something like this:

def chainlit_route():
    token = get_access_token()   # returns a token that contains the attributes that chainlit expects for creating a User object.
    resp = make_response(render_template('chainlit-template.html'))
    resp.set_cookie('chainlit_access_token', token, max_age=60*15, httponly=True) # shorter expiration is more secure
    return resp

The "chainlit-template.html" looks something like this:

<!DOCTYPE html>
<html lang="en">
  <head>
  ...
  </head>
  <body>
        <iframe src="/chainlit"></iframe> <!-- "/chainlit" is the ROOT PATH -->
  </body>
</html>

I use the Header Auth approach to get the user information into chainlit (see the chainlit docs), so my chainlit target app implements the required callback that looks something like this:

@cl.header_auth_callback
def header_auth_callback(headers: Dict) -> Optional[cl.User]:
    token = get_cookie('chainlit_access_token', headers)

    if token:
        try:
            # the "secret_key" must be the same secret key used to create the access token in "get_access_token()" above
            user = jwt.decode(token, secret_key, ['HS256'])
            clUser = cl.User(identifier=user['identifier'], display_name=user.get('display_name'), metadata=user['metadata'])

            return clUser
        except Exception as e:
            # handle the exception as needed
            pass

    return None

I hope this helps, but let me know if something is missing.

dokterbob commented 3 weeks ago

Just found this, seems a well-supported OAuth library.

https://frankie567.github.io/httpx-oauth/fastapi/

dokterbob commented 3 weeks ago

I must be having a crisis of imagination because I'm still having trouble understanding why auth should be part of CL at all, So can you help me by making a case for why it should be? I'm guessing I might be missing something.

For example, who are the main target users/use cases? What kinds of things are they building and how are they using CL? Why do they want/need auth-in-a-box, particularly when spinning up a secured Flask/FastAPI/Django/etc app is so relatively easy, well-supported, and flexible?

Why shouldn't CL at the lower levels be thought of as a microservice behind an app with more "meat" and security awareness?

As you've pointed out, taking on the burden of security is not easy, so make a good case!

One of the things which I personally love (I am not in a position to authoritatively answer your questions, this is just my ideas), is that chainlit allows building custom LLM agents with a very low barrier to entry. That is, within 15m people with virtually no code experience can launch a custom LLM frontend.

Where I think we want to be is then allowing a seamless road from that initial point towards having user-facing LLM's with agents and the whole shebang in production. My personal believe is: we're not there, just yet.

I do think auth hooks should be part of chainlit (e.g. we want to have access to user data, be able use auth tokens to access 3rd party API's). I like that I can deploy chainlit, with authentication, in a container within hours.

But I agree that having the actual auth implementation even part of this repo is a very, very bad idea. I'm still spec'ing this and have found myself often held back trying to manage both outside contributions, internal roadmaps (factoring out non-core stuff to a community-maintained and led repo) and a somewhat worrying level of technical debt (but we're catching up, IMHO).

My rough ideas at this point are:

Sorry it took me a while to get back to this. As it stands, I am only available 2 days a week to support Chainlit's development.

I'm looking forward to your feedback and suggestions.

@stephenrs Sorry if I got a bit stingy at times, please be aware that we're doing the absolute best we can with very limited capacity. We barely have time to even review all the PR's we're getting in while balancing needs.