OoriData / OgbujiPT

Client-side toolkit for using large language models, including where self-hosted
Apache License 2.0
103 stars 8 forks source link

Encapsulate LLM connections #39

Closed uogbuji closed 1 year ago

uogbuji commented 1 year ago

Right now, again based on our initial, long obsolete Langchain orientation, we are managing the OpenAI API connection globally. To be fair, the openai library encourages such bad habits as well.

In addition to just having cleaner code, we also need to support multiple LLMs, for example someone could use different LLMs for different parts of an agent/tool interaction.

Enable all this by OO encapsulating LLM connections. This will also come in handy as we add support to connections not made via OpenAI API (e.g. LLMs loaded within a local process).

krrishdholakia commented 1 year ago

Hey @uogbuji i think we might be able to help here - https://github.com/BerriAI/litellm

I'm the maintainer of litellm - a drop-in replacement for the openai-python sdk that handles api calls for anthropic, azure, huggingface, togetherai, replicate, etc.

uogbuji commented 1 year ago

Hi @krrishdholakia thanks for your interest in what we're doing here! I like the bias to simplicity in litellm. I'd have to dig in a lot more, but just at first glance I was struck by this snippet:

response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk['choices'][0]['delta'])

I'd expect that to be an async for (or some equivalent construct), otherwise, it's not really streaming, I think, or at least not in a way that supports concurrency.

One of the biggest reasons we're rewrapping so much of this is to get true concurrency and (reasonable) isolation right. I'm definitely open to collaborations, so as I say, I'll try to get a chance to dig more into litellm to make sure it would suit our architectural imperatives. Unfortunately I'm heading into a block of travel. I'll be scarce in the 1st & 3rd weeks of October & scrambling to keep up in between, so if you don't hear back in a little while, that's probably why.

krrishdholakia commented 1 year ago

hey @uogbuji we support async streaming as well - https://docs.litellm.ai/docs/completion/stream#async-streaming

from litellm import completion
import asyncio

def logger_fn(model_call_object: dict):
    print(f"LOGGER FUNCTION: {model_call_object}")

user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]

async def completion_call():
    try:
        response = completion(
            model="gpt-3.5-turbo", messages=messages, stream=True, logger_fn=logger_fn
        )
        print(f"response: {response}")
        complete_response = ""
        start_time = time.time()
        # Change for loop to async for loop
        async for chunk in response:
            chunk_time = time.time()
            print(f"time since initial request: {chunk_time - start_time:.5f}")
            print(chunk["choices"][0]["delta"])
            complete_response += chunk["choices"][0]["delta"]["content"]
        if complete_response == "": 
            raise Exception("Empty response received")
    except:
        print(f"error occurred: {traceback.format_exc()}")
        pass

asyncio.run(completion_call())

Let me know if this solves your problem

Also open to suggestions on where in docs you were looking for this

uogbuji commented 1 year ago

With the latest:

from ctransformers import AutoModelForCausalLM
from ogbujipt.llm_wrapper import ctrans_wrapper
MY_MODELS = '/Users/uche/.local/share/models'  # Salt to taste
model = AutoModelForCausalLM.from_pretrained(
        f'{MY_MODELS}/TheBloke_LlongOrca-13B-16K-GGUF',
        model_file='llongorca-13b-16k.Q5_K_M.gguf',
        model_type="llama",
        gpu_layers=50)
oapi = ctrans_wrapper(model=model)
print(oapi('The quick brown fox'))

Built ctransformers for my Mac as follows:

CT_METAL=1 pip install "ctransformers>=0.2.24" --no-binary ctransformers
krrishdholakia commented 1 year ago

i'm confused. this looks like you're calling local models. i thought the issue was for openai api calls?

uogbuji commented 1 year ago

Hi @krrishdholakia, I'm sure I can be offering more clarity on all this. I mentioned my upcoming travel. This particular burst of work is addressing an issue for a client, and I want to get some previously planned moves in place before I leave on Weds. Some of this work was part-documented in internal repositories you won't have seen.

That said, this ticket is about encapsulating LLM capability in general; OpenAI APIs are but one means of working with LLMs. We've always intended to support a selection of in-memory LLM loaders as well. This commit brings some work from a separate repository into OgbujiPT.

Thanks for answering my question about async support in litellm. I do plan to have a look, but again, as a priority I need to get some pre-discussed bits in place for my colleagues before my trip. I think you asked a question about litellm docs. I went purely by the README. Haven't had a chance to peruse the docs.

I'll definitely add some more context to the corresponding PR, needed for the changelog anyway.