c0sogi / llama-api

An OpenAI-like LLaMA inference API
MIT License
110 stars 9 forks source link

Proxy to openAI #9

Open kreolsky opened 11 months ago

kreolsky commented 11 months ago

Hi! I have a strange suggestion :) Do a proxy object that will send requests to openal if in openai_replacement_models specifies openai_proxy (or something like it).

For example: openai_replacement_models = {"gpt-3.5-turbo": "my_ggml", "gpt-4": "openai_proxy", "lllama": "another_ggml"} If user call gpt-3.5-turbo - api server will use my_ggml, if user call gpt-4 - send request to openai. This will make it easy to use both local llama and openai at the same time.

PS: Thanx so much for example with LangChain!

c0sogi commented 11 months ago
# my_model_def.py
from llama_api.schemas.models import ExllamaModel, LlamaCppModel, ReverseProxyModel

gpt35 = ReverseProxyModel(model_path="https://api.openai.com")
openai_replacement_models = {"gpt-3.5-turbo": "gpt35"}
# test.py
import requests

url = "http://localhost:8000/v1/chat/completions"
payload = {
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello there!"}],
    "max_tokens": 30,
    "top_p": 0.9,
    "temperature": 0.9,
    "stop": ["\n"],
}
headers = {
    "Authorization": "Bearer YOUR_OPENAI_KEY"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())

# output:
# {'id': 'chatcmpl-7wVt6byZm6S8ybQcijoUGNN1Jt2cC', 'object': 'chat.completion', 'created': 1694179956, 'model': 'gpt-3.5-turbo-0613', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 10, 'completion_tokens': 9, 'total_tokens': 19}}

I think this will work.

delta-whiplash commented 8 months ago

@c0sogi If I want to proxy to a self hosted whisper (fast-whisper or OPENAI whisper) API how can I do ?