Open kreolsky opened 11 months ago
# my_model_def.py
from llama_api.schemas.models import ExllamaModel, LlamaCppModel, ReverseProxyModel
gpt35 = ReverseProxyModel(model_path="https://api.openai.com")
openai_replacement_models = {"gpt-3.5-turbo": "gpt35"}
# test.py
import requests
url = "http://localhost:8000/v1/chat/completions"
payload = {
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello there!"}],
"max_tokens": 30,
"top_p": 0.9,
"temperature": 0.9,
"stop": ["\n"],
}
headers = {
"Authorization": "Bearer YOUR_OPENAI_KEY"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
# output:
# {'id': 'chatcmpl-7wVt6byZm6S8ybQcijoUGNN1Jt2cC', 'object': 'chat.completion', 'created': 1694179956, 'model': 'gpt-3.5-turbo-0613', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 10, 'completion_tokens': 9, 'total_tokens': 19}}
I think this will work.
@c0sogi If I want to proxy to a self hosted whisper (fast-whisper or OPENAI whisper) API how can I do ?
Hi! I have a strange suggestion :) Do a proxy object that will send requests to openal if in openai_replacement_models specifies openai_proxy (or something like it).
For example: openai_replacement_models = {"gpt-3.5-turbo": "my_ggml", "gpt-4": "openai_proxy", "lllama": "another_ggml"} If user call gpt-3.5-turbo - api server will use my_ggml, if user call gpt-4 - send request to openai. This will make it easy to use both local llama and openai at the same time.
PS: Thanx so much for example with LangChain!