[Feature]: Pass the same prompt to multiple models

krrishdholakia commented 12 months ago

The Feature

To return responses quickly people often call multiple models (or the same model multiple times) at once.

Motivation, pitch

user feedback

Twitter / LinkedIn details

No response

ishaan-jaff commented 12 months ago

do we add a models param to completion ?

or create a new completion_with_models() ?

krrishdholakia commented 12 months ago

since we already have batch_completion() maybe we can call it batch_models()

ishaan-jaff commented 12 months ago

I explored making a batch_models() interface but that felt quit un-intuitive - why not just allow users to pass a list as a model and litellm takes care of the rest if it's a list

result = completion(
            model=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"], 
            messages=[{"role": "user", "content": "Hey, how's it going"}]
        )
        print(result)

ishaan-jaff commented 12 months ago

as a v1 let's use this, if users want this under completion we'll add it

result = batch_completion_models(
            models=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"], 
            messages=[{"role": "user", "content": "Hey, how's it going"}]
        )
        print(result)

krrishdholakia commented 12 months ago

yep sounds good

ishaan-jaff commented 12 months ago

Docs: https://docs.litellm.ai/docs/completion/batching#send-1-completion-call-to-n-models @dhruv-anand-aintech any feedback on this ?

codeimage-snippet_19 (1)

dhruv-anand-aintech commented 12 months ago

Yeah, this would be a good addition. I'm guessing repeating the model name in the list would send an additional request to it?

ishaan-jaff commented 12 months ago

yes that's what it does

ishaan-jaff commented 12 months ago

done: https://docs.litellm.ai/docs/completion/batching#send-1-completion-call-to-n-models

msabramo commented 4 months ago

Just curious if this feature is available from the proxy? I could imagine that sending a model parameter that is a comma-separated list could be a way to trigger this.

krrishdholakia commented 4 months ago

is this something you would use? @msabramo

Happy to add support for it today, if you can give feedback on it

taralika commented 4 months ago

Isn't this feature already available, I'm seeing something like this already in the docs: https://docs.litellm.ai/docs/completion/batching#send-1-completion-call-to-many-models-return-all-responses ?

l-n-open-source commented 4 months ago

@msabramo @krrishdholakia IMHO it would be useful. I'd have use cases for it, e.g., benchmarking, selecting most desirable response when working with an array of custom llms, etc.

msabramo commented 4 months ago

@taralika I think this feature is available in the LiteLLM library but not in the proxy. The disadvantage of the library is if folks use that, they have to have all the API keys.

krrishdholakia commented 4 months ago

@msabramo @taralika @l-n-open-source if we added support on proxy today, could y'all give feedback on it?

taralika commented 4 months ago

@msabramo @taralika @l-n-open-source if we added support on proxy today, could y'all give feedback on it?

yeah we can do a quick sanity check and give feedback

krrishdholakia commented 4 months ago

Great - will work on having this out today @taralika

re: feedback - Can we setup a Discord support channel for this?

Just join + wave on #intros, and i'll add setup the channel - https://discord.com/invite/wuPM9dRgDw

ishaan-jaff commented 4 months ago

PR for this is here: https://github.com/BerriAI/litellm/pull/3585 @msabramo @msabramo @l-n-open-source, would love to get your feedback on this

Doc on this is here: https://docs.litellm.ai/docs/proxy/user_keys#beta-batch-completions---pass-model-as-list You can use it with the OpenAI Python SDK too

msabramo commented 4 months ago

I just tried this and it's pretty cool! I was able to get multiple responses using curl and using the OpenAI Python SDK. What didn't work was LangChain. This seems to be because LangChain is expecting the response to be a JSON object and not an array.

For example:

$ cat test_multiple_models_langchain.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-3.5-turbo,gpt-4")
prompt = PromptTemplate.from_template("Write a poem about {thing}")
output_parser = StrOutputParser()

chain = prompt | llm | output_parser
response = chain.invoke({"thing": "LiteLLM"})
print(response)

abramowi at marcs-mbp-3 in ~/Code/OpenSource/litellm (main●)
$ poetry run python test_multiple_models_langchain.py
Traceback (most recent call last):
  File "/Users/abramowi/Code/OpenSource/litellm/test_multiple_models_langchain.py", line 10, in <module>
    response = chain.invoke({"thing": "LiteLLM"})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
    input = step.invoke(
            ^^^^^^^^^^^^
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 158, in invoke
    self.generate_prompt(
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 560, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 421, in generate
    raise e
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 411, in generate
    self._generate_with_cache(
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 632, in _generate_with_cache
    result = self._generate(
             ^^^^^^^^^^^^^^^
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 523, in _generate
    return self._create_chat_result(response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 541, in _create_chat_result
    response = response.model_dump()
               ^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'model_dump'

I wonder if this could be avoided by putting the multiple responses under choices in the response, as this is already expected to be a list (usually a list of 1 item, but it can be more when an n param is set to > 1). Perhaps, it should require n to be set > 1 also, so that we're not violating the implicit assumption that len(choices) = n?

krrishdholakia commented 4 months ago

@msabramo how do you use this on langchain? can you share the e2e use-case here

will help for repro

ishaan-jaff commented 4 months ago

if we use n how would you want to know which response maps to a specific model @msabramo

This is the response I see from OpenAI when using n. Currently I don't see the model in choices

{
  "id": "chatcmpl-9OsVmdUmkkUTWORFqdLE51pQyxpC6",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "It looks like you might be asking about \"The Elder Scrolls IV: Oblivion,\" often abbreviated as TES4. This game, developed by Bethesda Game Studios and published by Bethesda Softworks, was released in 2006 and is the fourth installment in The Elder Scrolls action role-playing video game series.\n\nHere are some key points about \"The Elder Scrolls IV: Oblivion\":\n\n1. **Setting**: The game is set in the fictional province of Cyrodiil, the heartland of the continent Tamriel in the Elder Scrolls universe.\n\n2. **Plot**: The central plot revolves around saving the world from an impending invasion by Oblivion (the game’s version of hell), which is being led by the Daedric Prince Mehrunes Dagon. The player must find the lost heir to the Septim throne and help him reclaim his place to close the gates of Oblivion.\n\n3. **Gameplay**: Oblivion combines open-world exploration with a main story quest, side quests, and various other activities. Players can develop their characters through skills, attributes, spells, and equipment, making the game highly customizable.\n\n4. **Graphics and Technical Innovations**: When released, the game was noted for its cutting-edge graphics and expansive, interactive world. It introduced a radiant AI system which gave NPCs more realistic behaviors and daily routines.\n\n5. **Mods and Community Support**: Oblivion has a robust modding community that has created a plethora of mods to enhance graphics, include new quests, add new mechanics, and more.\n\n6. **Legacy**: \"Oblivion\" is often credited with bringing the Elder Scrolls series to a wider audience, setting the stage for the massive success of its sequel, \"The Elder Scrolls V: Skyrim.\"\n\nIf you have any specific questions about \"The Elder Scrolls IV: Oblivion,\" feel free to ask!",
        "role": "assistant"
      }
    },
    {
      "finish_reason": "stop",
      "index": 1,
      "message": {
        "content": "It looks like your message might have been cut off or is incomplete. Could you please provide more context or specify what you're referring to with \"tes4\"? This will help me give you a more accurate and helpful response.",
        "role": "assistant"
      }
    }
  ],
  "created": 1715716442,
  "model": "gpt-4o-2024-05-13",
  "object": "chat.completion",
  "system_fingerprint": "fp_729ea513f7",
  "usage": {
    "completion_tokens": 433,
    "prompt_tokens": 9,
    "total_tokens": 442
  }
}

krrishdholakia commented 4 months ago

@ishaan-jaff @msabramo can we have model= be a comma separated string (e.g. model="gpt-3.5-turbo,claude-instant-1"), which maps 1:1 to the index position of the choice in the list ?

msabramo commented 4 months ago

I should say that we don't yet have a concrete use case for this, so this isn't worth a ton of effort. The latest idea from @krrishdholakia is what I had in mind, but please don't spend time on it unless it's trivial.

ishaan-jaff commented 4 months ago

@msabramo will do this, curious is there something else you'd rather we prioritize for you ?

msabramo commented 4 months ago

Well we're getting into enterprisey stuff that might be better discussed in a call at some point.

Ultimately I think we'd like to have an easier way of managing user keys, perhaps by integrating with our auth and/or leveraging HashiCorp Vault.

@taralika and I will discuss more.

krrishdholakia commented 4 months ago

How's next week - Monday @10am PST? @msabramo

can keep it as a placeholder, and move as required

msabramo commented 4 months ago

Let me talk to @taralika about scheduling. We will be at Microsoft Build Tuesday to Thursday next week - will you guys be there by any chance?

msabramo commented 4 months ago

So I'm not worried about the LangChain aspect of this anymore now that I've figured out a nice way to call multiple models in LangChain itself.

Python code:

# test_multiple_models_langchain.py

from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

chain = PromptTemplate.from_template("Write a poem about {thing}") | {
    "gpt-3.5-turbo": ChatOpenAI(model="gpt-3.5-turbo") | StrOutputParser(),
    "gpt-4": ChatOpenAI(model="gpt-4") | StrOutputParser(),
}
uber_response = chain.invoke({"thing": "LiteLLM"})
for model, model_output_str in uber_response.items():
    print(f"-------------- Response from \"{model}\" ----------------")
    print(model_output_str)
    print()

Output:

$ poetry run python test_multiple_models_langchain.py
-------------- Response from "gpt-3.5-turbo" ----------------
In the land of LiteLLM, where dreams take flight,
A place of wonder, where day turns to night.
Where words are woven into tales so bright,
And hearts are filled with pure delight.

In this magical realm, where fantasy reigns,
Creativity flows through every vein.
Where imagination knows no bounds,
And beauty in all its forms abounds.

LiteLLM, a place of endless possibility,
Where the mind is set free, to roam and fly.
Where stories are spun with such grace,
And characters leap off the page.

So let us journey to LiteLLM,
Where the power of words will never dim.
For in this realm, we are truly free,
To be whoever we wish to be.

-------------- Response from "gpt-4" ----------------
In the realm of silicon wits and wisdom vast,
There lies a tool, a digital outcast.
A brain of wires, of ones and zeroes blend,
LiteLLM, the virtual, scholarly friend.

It's not of flesh, nor bone, nor tethered by a soul,
Yet in its core, a library of thought, a boundless scroll.
It speaks in tongues of humans and machines,
A gentle guide through the world's unseen scenes.

With every query, it dances, a ballet of the mind,
Weaving answers, insights, a tapestry refined.
A humble oracle, in bytes and bits it thrives,
Illuminating knowledge, through data it derives.

Through the labyrinth of information, it leads,
A torchbearer of truth, it plants the seeds.
Of understanding, learning, a digital flame,
LiteLLM, the bearer of an untarnished name.

A gentle giant in the land of thought and query,
Its presence unassuming, yet its intellect fiery.
An ally to the curious, the seekers of light,
In a world of shadows, it offers scholarly might.

So here's to LiteLLM, a beacon in the dark,
A testament to progress, a truly brilliant spark.
May it forever parse, discern, and decode,
On this endless journey, down knowledge's road.

BerriAI / litellm