[LLM] Support LLM routing through notdiamond

xingyaoww commented 1 month ago

What problem or use case are you trying to solve?

Not Diamond intelligently identifies which LLM is best-suited to respond to any given query. We want to implement a mechanism in OpenHands to support this type of "LLM" selector.

Describe the UX of the solution you'd like

Ideally, use should define a "LLMRouter" as a special type of LLM with some special configs (e.g., multiple keys for different providers). And user can just put in keys, and select that router, and OpenHands will automatically use that going forward.

Do you have thoughts on the technical implementation?

Modify https://github.com/All-Hands-AI/OpenHands/blob/main/openhands/llm/llm.py, as well as config related files under https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/core/config.

You should probably use model_select (from notdiamond API) rather than create to be compatible with existing LiteLLM calls.

Describe alternatives you've considered

Additional context

Here's the documentation from NotDiamond

# -*- coding: utf-8 -*-
"""Getting started with Not Diamond

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1Ao-YhYF_S6QP5UGp_kYhgKps_Sw3a2RO

# **Setting up**
"""

!pip install -q notdiamond[create] --upgrade

import os

os.environ["NOTDIAMOND_API_KEY"] = 'YOUR_NOTDIAMOND_API_KEY'
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
os.environ["ANTHROPIC_API_KEY"] = 'YOUR_ANTHROPIC_API_KEY'
os.environ["PPLX_API_KEY"] = 'YOUR_PERPLEXITY_API_KEY'

"""# **Automatic routing and model calling**

We'll start by defining the routing `NotDiamond` client, which will function like a 'meta-LLM' that ensembles together the best of multiple models.

We then add specific LLM targets for routing. Not Diamond works with any LLM in [this list](https://notdiamond.readme.io/docs/llm-models).
"""

from notdiamond import NotDiamond

client = NotDiamond()

llm_providers = [
    'openai/gpt-4o',
    'anthropic/claude-3-5-sonnet-20240620',
    'openai/gpt-4o-mini',
    'perplexity/llama-3.1-sonar-large-128k-online'
]

"""Next, let's call the client by passing in an array of messages and our target models:"""

result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}  # Adjust as desired
    ],
    model=llm_providers
)

print("LLM called: \n", provider.model)
print("\nLLM output: \n", result.content)

"""If we pass in a different question, `NotDiamond` will recommend a different model:"""

result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather like in SF today?"}
    ],
    model=llm_providers
)

print("LLM called: \n", provider.model)
print("\nLLM output: \n", result.content)

"""# **Defining tradeoffs**

We can also define tradeoffs for cost or latency to adjust routing to our preferences:
"""

result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}
    ],
    model=llm_providers,
    tradeoff="cost" # Consider cheaper models when quality loss is negligible. Alternatively, you can use "latency".
)

print("LLM called: \n", provider.model)
print("\nLLM output: \n", result.content)

"""# **Routing recommendations with `model_select`**

Finally, we can also use `model_select` to return a recommended LLM for your prompt. You can then invoke that LLM using your own custom logic.
"""

session_id, provider = client.chat.completions.model_select(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}
    ],
    model=llm_providers
)

## Add more application logic here. e.g.:

print(f"LLM called: \n{provider.model}\n", )

match provider.model:
    case "claude-3-5-sonnet-20240620":
        print("Running custom_sonnet_3-5_invoke()...")
        # custom_sonnet_35_invoke()
    case 'gpt-4o':
        print("Running custom_gpt_4o_invoke()...")
        # custom_gpt_4o_invoke()
    case 'llama-3-sonar-large-32k-online':
        print("Running custom_pplx_llama3_invoke()...")
        # custom_pplx_llama3_invoke()
    case 'gpt-4o-mini':
        print("Running custom_gpt_4o_mini_invoke()...")
        # custom_gpt_4o_mini_invoke()

github-actions[bot] commented 1 month ago

OpenHands started fixing the issue! You can monitor the progress here.

tobitege commented 1 month ago

@xingyaoww - also see #4109 where litellm's Router is being incorporated and also a config structure that could maybe used here

github-actions[bot] commented 1 month ago

OpenHands started fixing the issue! You can monitor the progress here.

github-actions[bot] commented 1 month ago

An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-4184' has been created with the attempted changes. You can view the branch here. Manual intervention may be required.

neubig commented 1 month ago

Quick point of discussion: do we want to implement this within OpenHands? Or should we host a server with the router, like we host our proxy server for All Hands AI?

Personally I think the latter might be better. Doing this on the client side means that users have to acquire several different API keys and somehow configure them. This seems like a pain UI-wise, especially given that currently our configuration behavior is hard to understand: https://github.com/All-Hands-AI/OpenHands/issues/3220

xingyaoww commented 1 month ago

Good point - but another thing is it might be tricky to calculate costs (especially with all the prompt caching and stuff.. for the router than :(.

Another potential idea is to do this with LiteLLM router 🤔 https://docs.litellm.ai/docs/routing#advanced---routing-strategies-%EF%B8%8F

neubig commented 1 month ago

Yeah, maybe NotDiamond could be implemented as a custom routing strategy within the LiteLLM proxy?

xingyaoww commented 1 month ago

yeah seems like a better approach (if we can get the cost propagation to work correctly). Close this for now then

acompa commented 1 month ago

Hi @xingyaoww @neubig, just caught this issue.

While our LLMConfigs accept prices, they only help tune cost tradeoffs. You won't have to provide that parameter for public models - we track prices for every model we support.

Beyond this, we're also happy to help you set up a routing integration with Not Diamond's API. Just let me know if that interests you.

As for LiteLLM, we've actually been discussing an integration with them since July! While waiting on their feedback, we've also implemented a simple integration in our Python client which might help you.

neubig commented 1 month ago

Thanks @acompa , I do think we'd be interested in at least running an evaluation where we use NotDiamond as a backend and see if the results are better/cheaper than what we get now. If your API offers OpenAI compatible endpoints it should be pretty easy (we haven't looked super-carefully yet).

acompa commented 1 month ago

Thanks @acompa , I do think we'd be interested in at least running an evaluation where we use NotDiamond as a backend and see if the results are better/cheaper than what we get now. If your API offers OpenAI compatible endpoints it should be pretty easy (we haven't looked super-carefully yet).

We do accept OpenAI-style requests with messages at our model_select endpoint. We're not a proxy, though, so at the moment we only support create via a Langchain integration.

neubig commented 1 month ago

Cool, thanks! I'll re-open this as I think that whatever way we implement it'd be interesting to see if model routing helps.

acompa commented 1 month ago

Excellent. As you begin your evaluation, note that we offer two approaches to AI model routing:

Our out-of-the-box router has been trained on generalist, cross-domain data (including coding and non-coding tasks) to provide a strong "multi-model" multidisciplinary experience.

Secondly, OpenHands focuses on development applications, and so you might benefit from specialized routing trained on the distribution of your proprietary data. We also offer custom routing to serve these types of domain-targeted use cases as a higher-performance option beyond out-of-the-box routing.

We're happy to answer questions or support you in whichever of these approaches you evaluate.

tobitege commented 1 month ago

@neubig we could also look into the https://github.com/Not-Diamond/RoRF/ repo to start with (pair-wise routing) to start with?

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

neubig commented 2 weeks ago

I think the NotDiamond folks are working on this still.

All-Hands-AI / OpenHands

[LLM] Support LLM routing through notdiamond #4184