LiteLLM instead of manual abstraction.

jordantgh commented 1 year ago

I'd be willing to work on this but would like your opinion before submitting any PR.

From a quick scan of your gpt_query.py, it looks like you're doing more work than you need to support different APIs. LiteLLM provides a nice abstraction layer (I guess there may be others out there). You would just write response = completion(model="gpt-4", messages=message_input) for OAI or response = completion(model="openrouter/openai/gpt-4", messages=message_input) for Openrouter, etc (Azure, Anthropic, Cohere and HF models also supported) and set appropriate environment variables for the chosen model. We could provide a .env_example with all the API keys that the user can copy.

jordantgh commented 1 year ago

Also part of the reason I was thinking about this is it would possibly make it easier to implement #2.

adamkarvonen commented 1 year ago

LiteLLM looks great. The manual abstraction is definitely annoying, and that looks like a better solution. I also tested some Llama base models on Replicate that I'm going to push some of the results later today, and it looks like LiteLLM supports Replicate as well.

jordantgh commented 1 year ago

Nice, I'll probably be able to look at submitting a PR in the coming weekend.

ishaan-jaff commented 1 year ago

Hi i'm the maintainer of LiteLLM - happy to make the PR too

jordantgh commented 1 year ago

Hi i'm the maintainer of LiteLLM - happy to make the PR too

It certainly won't bother me haha, and I'd guess it'll take you about 5% of the time it would take me 😁

ishaan-jaff commented 1 year ago

on it, would you be open to hopping on a quick call (10 mins) ?

I'd love to understand how LiteLLM can be better for you + chess_gpt_eval

sharing my calendly here for your convenience: https://calendly.com/ishaan-berri/30min?month=2023-09

jordantgh commented 1 year ago

on it, would you be open to hopping on a quick call (10 mins) ?

I'd love to understand how LiteLLM can be better for you + chess_gpt_eval

sharing my calendly here for your convenience: https://calendly.com/ishaan-berri/30min?month=2023-09

Not the maintainer of this repo btw, although we did happen to chat on a repo of mine 😁 In my case I don't have much to say beyond what I wrote already.

jordantgh commented 1 year ago

@adamkarvonen I haven't had time this weekend to do anything on this. I took a look at the code and it is hard for me to untangle. If you want to discuss we can, otherwise it'll take longer.

adamkarvonen commented 1 year ago

Sure. To replace my manual abstraction with LiteLLM, you should be able to replace these lines: https://github.com/adamkarvonen/chess_gpt_eval/blob/local_llama/gpt_query.py#L68-L79

with a call to LiteLLM. Looking above, it looks like that could be something like response = completion(model="gpt-4", messages=message_input)

The messages list is currently set as a list of OpenAI style input dicts, where the key can be 'system, 'user', or 'assistant'. Because there isn't a back and forth with the model, this is somewhat unnecessary, I was just reusing some code from an earlier project.

Base models and completions models don't need a system message, but chat models like Llama2-chat or GPT-4 need this system message to get the proper output format: https://github.com/adamkarvonen/chess_gpt_eval/blob/local_llama/gpt_query.py#L23

Does that help? I can take a look at this later as well, currently I'm trying to add support for the nanoGPT repository. I'll document my changes and clean up the code once Llama and nanoGPT support is added and functioning.

adamkarvonen / chess_gpt_eval

LiteLLM instead of manual abstraction. #1