BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.09k stars 1.53k forks source link

[Feature]: Retrieve models characteristics, including the completion mode attribute (also for discovery). #435

Closed solyarisoftware closed 1 year ago

solyarisoftware commented 1 year ago

The Feature

To use a completion model, in the design/dev phase, I would like to have a liteLLM function returning some basic attributes of a give model, by example:

Let's focus on OpenAI COMPLETION models (OpenaAI just as an example of an LLM provide). Consider the list of models: https://platform.openai.com/docs/models/gpt-3-5

Unfortunately (and weirdly IMMO) the OpenAI documentation do not specify explicitly, for each model, the attribute completion mode. Weird. Worst, In the Openai doc there is no a single poitn of reference for all attributes of a model. BTW, neither the Model.list() endpoint https://platform.openai.com/docs/api-reference/models/object return these details.

The first idea comes in my mind is so to maintain a model caracteristics "database" in liteLLM. This DB could be just a static YAML as this draft example:

llm_provider: OpenAI
models:
  - model_name: gpt-3.5-turbo
    description: Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration 2 weeks after it is released.
    max_tokens: 4097
    training_data: Up to Sep 2021
    completion_mode: chat
    cost_per_token: ...
    legacy: false

  - model_name: gpt-3.5-turbo-16k
    description: Same capabilities as the standard gpt-3.5-turbo model but with 4 times the context.
    max_tokens: 16385
    training_data: Up to Sep 2021
    completion_mode: chat
    cost_per_token: ...
    legacy: false

  - model_name: gpt-3.5-turbo-instruct
    description: Similar capabilities as text-davinci-003 but compatible with legacy Completions endpoint and not Chat Completions.
    max_tokens: 4097
    training_data: Up to Sep 2021
    completion_mode: text
    cost_per_token: ...
    legacy: false

  - model_name: gpt-3.5-turbo-0613
    description: Snapshot of gpt-3.5-turbo from June 13th 2023 with function calling data. Unlike gpt-3.5-turbo, this model will not receive updates and will be deprecated 3 months after a new version is released.
    max_tokens: 4097
    training_data: Up to Sep 2021
    completion_mode: chat
    cost_per_token: ...
    legacy: true

  - model_name: gpt-3.5-turbo-16k-0613
    description: Snapshot of gpt-3.5-turbo-16k from June 13th 2023. Unlike gpt-3.5-turbo-16k, this model will not receive updates and will be deprecated 3 months after a new version is released.
    max_tokens: 16385
    training_data: Up to Sep 2021
    completion_mode: chat
    cost_per_token: ...
    legacy: true

  - model_name: gpt-3.5-turbo-0301
    description: Snapshot of gpt-3.5-turbo from March 1st 2023. Unlike gpt-3.5-turbo, this model will not receive updates and will be deprecated on June 13th 2024 at the earliest.
    max_tokens: 4097
    training_data: Up to Sep 2021
    completion_mode: chat
    cost_per_token: ...
    legacy: true

  - model_name: text-davinci-003
    description: Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports some additional features such as inserting text.
    max_tokens: 4097
    training_data: Up to Jun 2021
    completion_mode: text
    cost_per_token: ...
    legacy: true

  - model_name: text-davinci-002
    description: Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning.
    max_tokens: 4097
    training_data: Up to Jun 2021
    completion_mode: text
    cost_per_token: ...
    legacy: true

  - model_name: code-davinci-002
    description: Optimized for code-completion tasks.
    max_tokens: 8001
    training_data: Up to Jun 2021
    completion_mode: text
    cost_per_token: ...
    legacy: true

Having this static database is maybe not the best idea. to be evaluated. But it have the pro to be simple to read and update by maintainers (when integrating new models).

By the way think become more complex with fine-tuned models. In this case, a fine-tuned model with name my_finetuned_text-davinci-002 could be derived by a text-davinci-002 so maybe it could be considered as a "subclass" inheriting characteristics of the superclass.

Also, with Azure Openai models you have another problem: again you don't know "a priori" thee deployment - model association.

Motivation, pitch

My specific motivation of having these characteristics info is related to the completion mode attribute

See:

Now, having the goal to use LiteLLM in my plugin, still exists this problem: in my application I always want a TEXT completion. And I currently use a CHAT completion models using these as pseudo-text completion models (inserting the input prompt in the system role). By the way LiteLLm already implemented my request to use a chat completion model as-it-was a text model, through the text_completion endpoint, see: https://docs.litellm.ai/docs/tutorials/text_completion#using-litellm-in-the-text-completion-format

Nevertheless currently you can't know if a certain model is a CHAT completion model or a TEXT completion model. This is the basic reason it could be nice to have a LiteLLm function like

characteristics(model='gpt-3.5-turbo-instruct')  

returns:

        {
            'model_name': 'gpt-3.5-turbo',
            'description': 'Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration 2 weeks after it is released.',
            'max_tokens': 4097,
            'training_data': 'Up to Sep 2021',
            'completion_mode': 'chat',
            'cost_per_token': '...',
            'legacy': False
        }

More in general, having this (completion) model characteristics database could be used for a "discovery" LiteLLM feature, where by example a command line tool could be used to query the database, looking some criteria/filters.

Twitter / LinkedIn details

twitter: @solyarisoftare linkedin: www.linkedin.com/in/giorgiorobino

krrishdholakia commented 1 year ago

@solyarisoftware I believe this is solved -

We maintain a model map, which contains max tokens, completion mode, cost per token etc.:

https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

Here's how you can access it in liteLLM: https://docs.litellm.ai/docs/token_usage#5-model_cost

krrishdholakia commented 1 year ago

Closing for now. Please reopen the issue if this doesn't solve your problem.

solyarisoftware commented 1 year ago

oh sorry I didn't read you already foreseen the json database. Maybe, as minor request it could be anyway useful to have a function that, given a model name, returns the data, e.g.

characteristics(model="text-curie-001") 
"""
=>
{
        "max_tokens": 2049,
        "input_cost_per_token": 0.000002,
        "output_cost_per_token": 0.000002,
        "litellm_provider": "text-completion-openai",
        "mode": "completion"
    }  
"""

maybe updating also the page: https://docs.litellm.ai/docs/token_usage#5-model_cost

Thanks

krrishdholakia commented 1 year ago

Hey @solyarisoftware doesn't model cost do exactly that? what do you feel is missing?

from litellm import model_cost 

print(model_cost) # {'gpt-3.5-turbo': {'max_tokens': 4000, 'input_cost_per_token': 1.5e-06, 'output_cost_per_token': 2e-06}, ...}
solyarisoftware commented 1 year ago

you are right, thanks:

from litellm import model_cost 
print(model_cost['gpt-3.5-turbo'])
#=> {'max_tokens': 4097, 'input_cost_per_token': 1.5e-06, 'output_cost_per_token': 2e-06, 'litellm_provider': 'openai', 'mode': 'chat'}
solyarisoftware commented 1 year ago

BTW, litellm spread: https://github.com/daveshap/Medical_Intake/discussions/6