AnswerDotAI / cosette

Claudette's sister, a helper for OpenAI GPT
https://answerdotai.github.io/cosette/
Apache License 2.0
21 stars 5 forks source link

Small tweaks for cosette to work with self hosted llama 3.1 (8b) #8

Open MaximeRivest opened 1 week ago

MaximeRivest commented 1 week ago

Hello,

I figured out that by running the code below at the beginning of a notebooks, it let's a person work with cosette (without tools and images) where vllm llama is the backend instead of openai. I am using it for self hosted llama server using vllm. I was wondering if you have any interest in creating a third sister to cosette and claudette? I was thinking of "cria" (baby llama...). If so would you want to host it on AnswerDotAI with the 2 other ones?

import openai
from cosette import *
cli = openai.OpenAI(base_url="http://XX.XXX.XXX.XXX:5000/v1", api_key="EMPTY")
cli = Client(model = "meta-llama/Meta-Llama-3.1-8B-Instruct", cli = cli)

from fastcore.utils import patch
from fastcore.meta import delegates
from openai.resources.chat import Completions

@patch
@delegates(Completions.create)
def __call__(self:Client,
             msgs:list, # List of messages in the dialog
             sp:str='', # System prompt
             stream:bool=False, # Stream response?
             **kwargs):
    "Make a call to LLM."
    if stream: kwargs['stream_options'] = {"include_usage": True}
    if sp: msgs = [mk_msg(sp, 'system')] + list(msgs)
    r = self.c.create(
        model=self.model, messages=msgs, stream=stream)
    if not stream: return self._r(r)
    else: return get_stream(map(self._r, r))

@patch
@delegates(Completions.create)
def __call__(self:Chat,
             pr=None,  # Prompt / message
             stream:bool=False, # Stream response?
             **kwargs):
    "Add prompt `pr` to dialog and get a response"
    if isinstance(pr,str): pr = pr.strip()
    if pr: self.h.append(mk_msg(pr))
    res = self.c(self.h, sp=self.sp, stream=stream)
    self.h += mk_toolres(res, ns=self.tools, obj=self)
    self.h[-1] = mk_msg(self.h[-1])
    return res

Best, Maxime