BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.04k stars 1.39k forks source link

πŸŽ… I WISH LITELLM HAD... #361

Open krrishdholakia opened 11 months ago

krrishdholakia commented 11 months ago

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW πŸ‘‡

With your request πŸ”₯ - if we have any questions, we'll follow up in comments / via DMs

Respond with ❀️ to any request you would also like to see

P.S.: Come say hi πŸ‘‹ on the Discord

bsu3338 commented 7 months ago

@ishaan-jaff I saw the pre call hooks, is there any documentation on someone trying this? If not, I can send you some once I try it.

ishaan-jaff commented 7 months ago

@bsu3338 what's missing in the pre-call hooks docs? Anything that would be helpful

is there any documentation on someone trying this?

I've used, + helped some other users set it up

dannysemi commented 7 months ago

Compatibility with runpod serverless endpoints. https://doc.runpod.io/reference/runpod-apis I created my own very messy proxy and I hate it.

ishaan-jaff commented 7 months ago

@dannysemi Tracking your request here: https://github.com/BerriAI/litellm/issues/1777

out of curiosity

dannysemi commented 7 months ago
  • why did you create a proxy ?

Because those enpoints aren't plug and play with openai requests

  • why did you hate it ?

Because it doesn't handle errors very well. If an error occurs on the worker and I don't manually terminate the worker it could run forever wasting my money

dannysemi commented 7 months ago

Nevermind, I think those are huggingface tgi compatible endpoints.

krrishdholakia commented 7 months ago

yea they are - it should work - https://docs.litellm.ai/docs/providers/huggingface

krrishdholakia commented 7 months ago

@dannysemi i believe 1.4.0+ huggingface tgi images are also openai-compatible with their new messages api - https://huggingface.co/docs/text-generation-inference/messages_api

bsu3338 commented 6 months ago

Please add redisvl module to the requirements.txt for semantic redis caching. This is so I do not have to build a custom docker container. Thank you and thanks for adding the feature! Just noticed in commit history it was added and then removed. Will this be coming back?

nivibilla commented 6 months ago

sglang support pretty please!

ranjancse26 commented 6 months ago

It would be great if you could provide a support for groq. Essentially, groq provides an Open AI based interface.

hlohaus commented 6 months ago

Is support for the g4f package planned? If wanted, I can create a pull request.

s-jse commented 5 months ago

I wish LiteLLM would show a progress bar for batch_completion(). It is nice to have when working with large batch jobs.

rlippmann commented 5 months ago

Not sure if this is already implemented, but...

Proactive routing. Instead of trying to route, failing, and falling back, maybe keep model max tokens so it can tell if the inference will fail anyway beforehand.

Also, perhaps a max parallelism for number of requests that can simultaneously be sent to an endpoint. This way it could round robin on empty endpoints instead of overloading one endpoint, and failing over.

andaldanar commented 5 months ago

I wish LiteLLM could support Cohere's Rerank API endpoint - thank you!

https://docs.cohere.com/docs/reranking

krrishdholakia commented 5 months ago

@rlippmann

pre-call checks for max tokens is live - https://docs.litellm.ai/docs/routing#pre-call-checks-context-window

max parallelism for number of requests -> explain to me how this might work? So do you want to set a max parallel request for an endpoint?

K-J-VV commented 5 months ago

Plans to add Private-GPT's API? https://github.com/zylon-ai/private-gpt

nileshtrivedi commented 5 months ago

I wish LiteLLM had a client library for Elixir, removing the need for me to run a separate proxy server.

RobertLiu0905 commented 4 months ago

I wish LiteLLM had simple serverless ability, some proxy services are not used continuously

ishaan-jaff commented 4 months ago

@RobertLiu0905 Cloudflare Python workers are here, we have an active issue to get litellm support on Cloudflare workers: https://github.com/cloudflare/workerd/discussions/1943

Is this what you wanted ? Open to suggestions on other approaches

cc @TranquilMarmot

meetzuber commented 4 months ago

I wish LiteLLM had support for IBM watsonx.ai. https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html

Thanks

nbaav1 commented 4 months ago

I wish LiteLLM Proxy server had a config setting for proxy_base_url. For example hosting the server at http://0.0.0.0:4000/<proxy_base_url> or http://0.0.0.0:4000/abc/xyz. Meaning that I could do something like: litellm --model gpt-3.5-turbo --proxy_base_url abc/xyz And then:

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000/abc/xyz"
)

response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

This would simplify our infrastructure in AWS and still comply with company policies. Thanks!

twardoch commented 4 months ago

WISH: Expand batching

The Google AI Studio API for Genini Pro 1.5 has very harsh restrictions on RPM & TPM ( https://ai.google.dev/pricing ) but you get a FREE or $7+$21/M (1M+8k) LLM API.

The NEW OpenAI Batch API is 50% cheaper than the normal API so for GPT-4 Turbo is $5+$15/M (128k+4k) β€” but schedules processing, processes it very asynchronously on their end, and delivers results β€œlater”.

https://help.openai.com/en/articles/9197833-batch-api-faq

It would be great to create an OpenAI-compatible Batch API abstraction which for OpenAI uses their Batch API abstraction directly, but for other models uses local batching, pooling, RPM&TPM limitation etc., and works in a similar way.

I imagine that other API providers may follow suit with their native, cheaper batch API, so an abstraction would be highly desirable.

I know LiteLLM has its own batching already (which is slightly different in concept), so my request might be an extension to that.

Why?

Well, many of us have use cases for MASS LLM processing: translation, summsrization, rewriting (like coreference resolution, NER etc.). We don't need "ASAP async" for those, but cheaper is always better πŸ˜ƒ

motin commented 4 months ago

I wish it was possible to specify which callbacks LiteLLM would use on a per request basis (e.g. without modifying global state)

andersskog commented 4 months ago

I wish LiteLLM logger would support json logging, with a more succinct message and extra fields with longer strings. Logging of requests to LLM providers is specially long and unformatted.

andersskog commented 4 months ago

I wish LiteLLM would implement stronger typing for methods.

As an example, when I call:

response = await litellm.acompletion(stream=True, **kwargs)

I need to do the following assertions:

assert isinstance(response, litellm.CustomStreamWrapper)
    async for chunk in response:
        assert isinstance(chunk, litellm.ModelResponse)
        assert isinstance(chunk.choices[0], litellm.utils.StreamingChoices)

since I'm working in a typed codebase enforced with pyright.

krrishdholakia commented 4 months ago

Hey @andersskog just pushed the v1 for json logging - https://github.com/BerriAI/litellm/commit/b46db8b89135a6b17f5b0797fdf20ec34735f8b0

You can enable it with litellm.json_logs = True. It currently just logs the raw request sent by litellm. Open to feedback on this.

zhaoninge commented 4 months ago

I wish litellm had an API to check available models from providers in real time.

QwertyJack commented 4 months ago

I wish LiteLLM had support for Sambaverse. https://docs.sambanova.ai/sambaverse/latest/index.html

Thanks

horahoradev commented 4 months ago

Discord alerting would be nice

ggallotti commented 3 months ago

Wilcard for model_name property in model_list:

model_list:
  - model_name: "vertex_ai/*"
    litellm_params:
      model: "vertex_ai/*"
      vertex_project: os.environ/VERTEXAI_PROJECT
  - model_name: "anthropic/*"
    litellm_params:
      model: "anthropic/*"
      api_key: os.environ/ANTHROPIC_API_KEY      
  - model_name: "gemini/*"
    litellm_params:
      model: "gemini/*"
      api_key: os.environ/GEMINI_API_KEY
krrishdholakia commented 3 months ago

@ggallotti would that be similar to how we do it for openai today -

Screenshot 2024-05-13 at 1 45 08 PM

https://docs.litellm.ai/docs/providers/openai#2-start-the-proxy

ggallotti commented 3 months ago

@ggallotti would that be similar to how we do it for openai today - Screenshot 2024-05-13 at 1 45 08 PM

https://docs.litellm.ai/docs/providers/openai#2-start-the-proxy

Thanks for the response. But that configuration does not works, as will force the OpenAI apikey for other models.

ducnvu commented 3 months ago

Streamlined way to call vision and non-vision models would be great. Being LLM-agnostic is a big reason why I use the package but currently still have to handle different request format depending on which model it goes to.

For example: Calling GPT4 Vision, messages.content is an array. Using the same code to call Azure's Command R+ would result in

litellm.exceptions.APIError: OpenAIException - Error code: 400 - {'message': 'invalid type: parameter messages.content is of type array but
 should be of type string.'}

I'm aware this is on the model provider's side, but GPT's non-vision models for example support both format.

krrishdholakia commented 3 months ago

@ducnvu seems like something we need to fix - can you share the command r call?

ducnvu commented 3 months ago

@krrishdholakia Thanks for the prompt response, the call is something like this. I don't have access to all models supported by litellm to test but so far OpenAI models work with both string messages.content and the format below, Command R is where I first encounter this error. All my calls are through Azure.

dict = {'temperature': 0.7, 'n': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'messages': [{'role': 'system', 'content': [{'type': 'text', 'text': "You are Command R Plus, answer as concisely as possible (e.g. don't be verbose). When writing code, specify the language as per the markdown format."}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'hi'}]}], 'timeout': 600, 'stream': True, 'model': 'azure/command-r-plus', 'api_base': BASE, 'api_key': KEY}

await litellm.acompletion(**dict())
guiramos commented 3 months ago

Hi guys, I am trying to use open interpreter with gemini 1.5 flash and getting this error:

raise APIConnectionError( litellm.exceptions.APIConnectionError: gemini does not support parameters: {'functions': [{'name': 'execute', 'description': "Executes code on the user's machine in the users local environment and returns the output", 'parameters': {'type': 'object', 'properties': {'language': {'type': 'string', 'description': 'The programming language (required parameter to the execute function)', 'enum': ['ruby', 'python', 'shell', 'javascript', 'html', 'applescript', 'r', 'powershell', 'react']}, 'code': {'type': 'string', 'description': 'The code to execute (required)'}}, 'required': ['language', 'code']}}]}, for model=gemini-1.5-flash-latest. To drop these, set litellm.drop_params=True or for proxy:

by default, open interpreter use functions and it seems to fail.

Does google gemini 1.5 via litellm supports functions? Which version?

If does not support, I wish litellm had this implemented...

guiramos commented 3 months ago

Ok, functions or tools is defintely not working.

I am following this tutorial and works greatly calling the gemini api directly: https://ai.google.dev/gemini-api/docs/function-calling/tutorial?lang=python

However, passing the same set of commands to litellm, gives this error:

litellm.exceptions.APIConnectionError: gemini does not support parameters: {'tools': [<function multiply at 0x14c684680>]}, for model=gemini-1.5-flash-latest. To drop these, set `litellm.drop_params=True` or for proxy:
`litellm_settings:
 drop_params: true`

I think part of the problem is in the utils.py:6570 check where the supported_params are being returned:

 elif custom_llm_provider == "palm" or custom_llm_provider == "gemini":
        return ["temperature", "top_p", "stream", "n", "stop", "max_tokens"]

gemini supports way more than that. I am making a call like this:

 return litellm.completion(
        messages=messages,
        temperature=0.0,
        model=target_model,
        tools=tools,
        safety_settings=[
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "threshold": "BLOCK_NONE",
            },
        ]
    )

But the tools argument is triggering the exception.

Can we get this addressed one of these days, please?

@krrishdholakia

krrishdholakia commented 3 months ago

@guiramos got it - found the issue, we have it implemented for vertex ai, not google ai studio (which i think is what you're calling).

Can you try running this with

return litellm.completion(
       messages=messages,
       temperature=0.0,
       model="vertex_ai/gemini-1.5-pro",
       tools=tools,
       safety_settings=[
           {
               "category": "HARM_CATEGORY_HARASSMENT",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_HATE_SPEECH",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
               "threshold": "BLOCK_NONE",
           },
           {
               "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
               "threshold": "BLOCK_NONE",
           },
       ]
   )

and let me know if that works? - https://docs.litellm.ai/docs/providers/vertex

Also tracking the issue for gemini google ai studio - https://github.com/BerriAI/litellm/issues/3086

guiramos commented 3 months ago

@krrishdholakia I could not test with vertex as I don't have a api key for that.

Also, I tried for google studio and did work! Using the new version 1.40.2.

Do you have an estimate day for this? Please help.

danielflaherty commented 2 months ago

@krrishdholakia I could not test with vertex as I don't have a api key for that.

Also, I tried for google studio and did work! Using the new version 1.40.2.

Do you have an estimate day for this? Please help.

+1. Would be great to gave an estimate for when 1.5 pro w/ tools is supported using AI studio.

krrishdholakia commented 2 months ago

hey @danielflaherty @guiramos this should be fixed by end of week

guiramos commented 2 months ago

@krrishdholakia really appreciate this! Thank you!

ishaan-jaff commented 2 months ago

Discord alerting would be nice

@horahoradev This is live now https://docs.litellm.ai/docs/proxy/alerting#advanced---using-discord-webhooks

@horahoradev any chance we can hop on a call sometime this week? I'd love to learn how we can improve litellm for you

My linkedin if you prefer DMs: https://www.linkedin.com/in/reffajnaahsi/ Sharing a link to my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

ishaan-jaff commented 2 months ago

Hi @nbaav1 We support this using the SERVER_ROOT_PATH env variable doc: https://docs.litellm.ai/docs/proxy/deploy#customization-of-the-server-root-path

@nbaav1 any chance we can hop on a call ? I'd love to learn how how we can improve litellm for you.

Screenshot 2024-06-12 at 2 20 13 PM

I wish LiteLLM Proxy server had a config setting for proxy_base_url. For example hosting the server at http://0.0.0.0:4000/<proxy_base_url> or http://0.0.0.0:4000/abc/xyz. Meaning that I could do something like: litellm --model gpt-3.5-turbo --proxy_base_url abc/xyz And then:

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000/abc/xyz"
)

response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

This would simplify our infrastructure in AWS and still comply with company policies. Thanks!

danielchalef commented 2 months ago

Support for Redis Clusters. LiteLLM currently only supports Redis Standalone nodes.

barakplasma commented 2 months ago

support vision on local images basically by adding support for local file urls to https://github.com/BerriAI/litellm/blob/3a35a58859a145a4a568548316a1930340e7440a/litellm/llms/prompt_templates/factory.py#L624-L635

andresd95 commented 2 months ago

Support for custom models imported in Bedrock.

Use case: we have a fine-tuned model deployed in Bedrock. The tuned model is based on OpenOrca, so the start and end tokens are different than instruct version.

If the provider is mistral, the template uses the instruct template rather than OpenOrca's.

  response = client.invoke_model(
      body={"prompt": "<s>[INST] hello, tell me a joke [/INST]\n", "max_tokens": 1024, "temperature": 0},
      modelId=<model_id>,
      accept=accept,
      contentType=contentType
  )

Tokens<|im_start|> and <|im_end|> should be used instead.

Tried using a custom provider as a workaround. However, the body is empty and the request fails:

  response = client.invoke_model(
      body={},
      modelId=<model_id>,
      accept=accept,
      contentType=contentType
  )

The only thing we need is that prompt template configuration is respected, as it is done with amazon or anthropic providers.

      model_id: "model_arn"
      roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
      bos_token: "<s>"
      eos_token: "<|im_end|>"

https://github.com/BerriAI/litellm/blob/3a35a58859a145a4a568548316a1930340e7440a/litellm/llms/bedrock.py#L743-L746

krrishdholakia commented 2 months ago

I wish it was possible to specify which callbacks LiteLLM would use on a per request basis (e.g. without modifying global state)

Hey @motin this is possible already

Proxy: https://docs.litellm.ai/docs/proxy/reliability#test---client-side-fallbacks

SDK: https://docs.litellm.ai/docs/completion/reliable_completions#fallbacks---switch-modelsapi-keysapi-bases

Taytay commented 2 months ago

First: We ❀️ LiteLLM I wish it supported the new Gemini context caching: https://ai.google.dev/gemini-api/docs/caching?lang=python

I admit I haven't thought the API through well, since this is a feature that only one providers offers at this point (but it likely won't be the last).