I came across this comment that highlights LiteLLM as an excellent approach to using any LLM with OpenAI format. If you agree, I am happy to submit a PR for it. An example of how it looks off of this feature branch:
def query_litellm(prompt, context=''):
messages = [{"content": context+prompt, "role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages)
answer = response['choices'][0]['message']['content'].strip()
followup_prompt = "What is a likely follow-up question or request? Return just the text of the question or request."
followup_messages = [{"content": answer, "role": "assistant"}, {"content": followup_prompt, "role": "user"}]
followup_response = completion(model="gpt-3.5-turbo", messages=followup_messages)
followup = followup_response['choices'][0]['message']['content'].strip()
return answer, followup
In my tests, it works pretty well, and allows for faster training data generation by querying cloud-hosted LLMs like GPT 3.5-Turbo, Claude 2, etc. Especially can be a preferred approach on MacBooks with slower performance.
I came across this comment that highlights LiteLLM as an excellent approach to using any LLM with OpenAI format. If you agree, I am happy to submit a PR for it. An example of how it looks off of this feature branch:
In my tests, it works pretty well, and allows for faster training data generation by querying cloud-hosted LLMs like GPT 3.5-Turbo, Claude 2, etc. Especially can be a preferred approach on MacBooks with slower performance.