Replace Ollama with LiteLLM to allow for any LLM with OpenAI compatibility

I came across this comment that highlights LiteLLM as an excellent approach to using any LLM with OpenAI format. If you agree, I am happy to submit a PR for it. An example of how it looks off of this feature branch:

def query_litellm(prompt, context=''):
    messages = [{"content": context+prompt, "role": "user"}]
    response = completion(model="gpt-3.5-turbo", messages=messages)
    answer = response['choices'][0]['message']['content'].strip()

    followup_prompt = "What is a likely follow-up question or request? Return just the text of the question or request."
    followup_messages = [{"content": answer, "role": "assistant"}, {"content": followup_prompt, "role": "user"}]
    followup_response = completion(model="gpt-3.5-turbo", messages=followup_messages)
    followup = followup_response['choices'][0]['message']['content'].strip()

    return answer, followup

In my tests, it works pretty well, and allows for faster training data generation by querying cloud-hosted LLMs like GPT 3.5-Turbo, Claude 2, etc. Especially can be a preferred approach on MacBooks with slower performance.

apeatling / simple-guide-to-mlx-finetuning

Replace Ollama with LiteLLM to allow for any LLM with OpenAI compatibility #2