Closed krrishdholakia closed 12 months ago
do we add a models
param to completion
?
or create a new completion_with_models()
?
since we already have batch_completion()
maybe we can call it batch_models()
I explored making a batch_models() interface but that felt quit un-intuitive - why not just allow users to pass a list as a model and litellm takes care of the rest if it's a list
result = completion(
model=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"],
messages=[{"role": "user", "content": "Hey, how's it going"}]
)
print(result)
as a v1 let's use this, if users want this under completion we'll add it
result = batch_completion_models(
models=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"],
messages=[{"role": "user", "content": "Hey, how's it going"}]
)
print(result)
yep sounds good
Docs: https://docs.litellm.ai/docs/completion/batching#send-1-completion-call-to-n-models @dhruv-anand-aintech any feedback on this ?
Yeah, this would be a good addition. I'm guessing repeating the model name in the list would send an additional request to it?
yes that's what it does
Just curious if this feature is available from the proxy? I could imagine that sending a model
parameter that is a comma-separated list could be a way to trigger this.
is this something you would use? @msabramo
Happy to add support for it today, if you can give feedback on it
Isn't this feature already available, I'm seeing something like this already in the docs: https://docs.litellm.ai/docs/completion/batching#send-1-completion-call-to-many-models-return-all-responses ?
@msabramo @krrishdholakia IMHO it would be useful. I'd have use cases for it, e.g., benchmarking, selecting most desirable response when working with an array of custom llms, etc.
@taralika I think this feature is available in the LiteLLM library but not in the proxy. The disadvantage of the library is if folks use that, they have to have all the API keys.
@msabramo @taralika @l-n-open-source if we added support on proxy today, could y'all give feedback on it?
@msabramo @taralika @l-n-open-source if we added support on proxy today, could y'all give feedback on it?
yeah we can do a quick sanity check and give feedback
Great - will work on having this out today @taralika
re: feedback - Can we setup a Discord support channel for this?
Just join + wave on #intros, and i'll add setup the channel - https://discord.com/invite/wuPM9dRgDw
PR for this is here: https://github.com/BerriAI/litellm/pull/3585 @msabramo @msabramo @l-n-open-source, would love to get your feedback on this
Doc on this is here: https://docs.litellm.ai/docs/proxy/user_keys#beta-batch-completions---pass-model-as-list You can use it with the OpenAI Python SDK too
I just tried this and it's pretty cool! I was able to get multiple responses using curl and using the OpenAI Python SDK. What didn't work was LangChain. This seems to be because LangChain is expecting the response to be a JSON object and not an array.
For example:
$ cat test_multiple_models_langchain.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(model="gpt-3.5-turbo,gpt-4")
prompt = PromptTemplate.from_template("Write a poem about {thing}")
output_parser = StrOutputParser()
chain = prompt | llm | output_parser
response = chain.invoke({"thing": "LiteLLM"})
print(response)
abramowi at marcs-mbp-3 in ~/Code/OpenSource/litellm (main●)
$ poetry run python test_multiple_models_langchain.py
Traceback (most recent call last):
File "/Users/abramowi/Code/OpenSource/litellm/test_multiple_models_langchain.py", line 10, in <module>
response = chain.invoke({"thing": "LiteLLM"})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
input = step.invoke(
^^^^^^^^^^^^
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 158, in invoke
self.generate_prompt(
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 560, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 421, in generate
raise e
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 411, in generate
self._generate_with_cache(
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 632, in _generate_with_cache
result = self._generate(
^^^^^^^^^^^^^^^
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 523, in _generate
return self._create_chat_result(response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abramowi/Library/Caches/pypoetry/virtualenvs/litellm-Fe7WjZrx-py3.12/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 541, in _create_chat_result
response = response.model_dump()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'model_dump'
I wonder if this could be avoided by putting the multiple responses under choices
in the response, as this is already expected to be a list (usually a list of 1 item, but it can be more when an n
param is set to > 1). Perhaps, it should require n
to be set > 1 also, so that we're not violating the implicit assumption that len(choices) = n
?
@msabramo how do you use this on langchain? can you share the e2e use-case here
will help for repro
if we use n
how would you want to know which response maps to a specific model @msabramo
This is the response I see from OpenAI when using n
. Currently I don't see the model
in choices
{
"id": "chatcmpl-9OsVmdUmkkUTWORFqdLE51pQyxpC6",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "It looks like you might be asking about \"The Elder Scrolls IV: Oblivion,\" often abbreviated as TES4. This game, developed by Bethesda Game Studios and published by Bethesda Softworks, was released in 2006 and is the fourth installment in The Elder Scrolls action role-playing video game series.\n\nHere are some key points about \"The Elder Scrolls IV: Oblivion\":\n\n1. **Setting**: The game is set in the fictional province of Cyrodiil, the heartland of the continent Tamriel in the Elder Scrolls universe.\n\n2. **Plot**: The central plot revolves around saving the world from an impending invasion by Oblivion (the game’s version of hell), which is being led by the Daedric Prince Mehrunes Dagon. The player must find the lost heir to the Septim throne and help him reclaim his place to close the gates of Oblivion.\n\n3. **Gameplay**: Oblivion combines open-world exploration with a main story quest, side quests, and various other activities. Players can develop their characters through skills, attributes, spells, and equipment, making the game highly customizable.\n\n4. **Graphics and Technical Innovations**: When released, the game was noted for its cutting-edge graphics and expansive, interactive world. It introduced a radiant AI system which gave NPCs more realistic behaviors and daily routines.\n\n5. **Mods and Community Support**: Oblivion has a robust modding community that has created a plethora of mods to enhance graphics, include new quests, add new mechanics, and more.\n\n6. **Legacy**: \"Oblivion\" is often credited with bringing the Elder Scrolls series to a wider audience, setting the stage for the massive success of its sequel, \"The Elder Scrolls V: Skyrim.\"\n\nIf you have any specific questions about \"The Elder Scrolls IV: Oblivion,\" feel free to ask!",
"role": "assistant"
}
},
{
"finish_reason": "stop",
"index": 1,
"message": {
"content": "It looks like your message might have been cut off or is incomplete. Could you please provide more context or specify what you're referring to with \"tes4\"? This will help me give you a more accurate and helpful response.",
"role": "assistant"
}
}
],
"created": 1715716442,
"model": "gpt-4o-2024-05-13",
"object": "chat.completion",
"system_fingerprint": "fp_729ea513f7",
"usage": {
"completion_tokens": 433,
"prompt_tokens": 9,
"total_tokens": 442
}
}
@ishaan-jaff @msabramo can we have model=
be a comma separated string (e.g. model="gpt-3.5-turbo,claude-instant-1"
), which maps 1:1 to the index position of the choice in the list ?
I should say that we don't yet have a concrete use case for this, so this isn't worth a ton of effort. The latest idea from @krrishdholakia is what I had in mind, but please don't spend time on it unless it's trivial.
@msabramo will do this, curious is there something else you'd rather we prioritize for you ?
Well we're getting into enterprisey stuff that might be better discussed in a call at some point.
Ultimately I think we'd like to have an easier way of managing user keys, perhaps by integrating with our auth and/or leveraging HashiCorp Vault.
@taralika and I will discuss more.
How's next week - Monday @10am PST? @msabramo
can keep it as a placeholder, and move as required
Let me talk to @taralika about scheduling. We will be at Microsoft Build Tuesday to Thursday next week - will you guys be there by any chance?
So I'm not worried about the LangChain aspect of this anymore now that I've figured out a nice way to call multiple models in LangChain itself.
Python code:
# test_multiple_models_langchain.py
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
chain = PromptTemplate.from_template("Write a poem about {thing}") | {
"gpt-3.5-turbo": ChatOpenAI(model="gpt-3.5-turbo") | StrOutputParser(),
"gpt-4": ChatOpenAI(model="gpt-4") | StrOutputParser(),
}
uber_response = chain.invoke({"thing": "LiteLLM"})
for model, model_output_str in uber_response.items():
print(f"-------------- Response from \"{model}\" ----------------")
print(model_output_str)
print()
Output:
$ poetry run python test_multiple_models_langchain.py
-------------- Response from "gpt-3.5-turbo" ----------------
In the land of LiteLLM, where dreams take flight,
A place of wonder, where day turns to night.
Where words are woven into tales so bright,
And hearts are filled with pure delight.
In this magical realm, where fantasy reigns,
Creativity flows through every vein.
Where imagination knows no bounds,
And beauty in all its forms abounds.
LiteLLM, a place of endless possibility,
Where the mind is set free, to roam and fly.
Where stories are spun with such grace,
And characters leap off the page.
So let us journey to LiteLLM,
Where the power of words will never dim.
For in this realm, we are truly free,
To be whoever we wish to be.
-------------- Response from "gpt-4" ----------------
In the realm of silicon wits and wisdom vast,
There lies a tool, a digital outcast.
A brain of wires, of ones and zeroes blend,
LiteLLM, the virtual, scholarly friend.
It's not of flesh, nor bone, nor tethered by a soul,
Yet in its core, a library of thought, a boundless scroll.
It speaks in tongues of humans and machines,
A gentle guide through the world's unseen scenes.
With every query, it dances, a ballet of the mind,
Weaving answers, insights, a tapestry refined.
A humble oracle, in bytes and bits it thrives,
Illuminating knowledge, through data it derives.
Through the labyrinth of information, it leads,
A torchbearer of truth, it plants the seeds.
Of understanding, learning, a digital flame,
LiteLLM, the bearer of an untarnished name.
A gentle giant in the land of thought and query,
Its presence unassuming, yet its intellect fiery.
An ally to the curious, the seekers of light,
In a world of shadows, it offers scholarly might.
So here's to LiteLLM, a beacon in the dark,
A testament to progress, a truly brilliant spark.
May it forever parse, discern, and decode,
On this endless journey, down knowledge's road.
The Feature
To return responses quickly people often call multiple models (or the same model multiple times) at once.
Motivation, pitch
user feedback
Twitter / LinkedIn details
No response