BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.37k stars 1.16k forks source link

🎅 I WISH LITELLM HAD... #361

Open krrishdholakia opened 10 months ago

krrishdholakia commented 10 months ago

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

krrishdholakia commented 10 months ago

[LiteLLM Client] Add new models via UI

Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though.

krrishdholakia commented 10 months ago

User / API Access Management

Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it.

krrishdholakia commented 10 months ago

cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here.

ishaan-jaff commented 10 months ago

[Spend Dashboard] View analytics for spend per llm and per user

ishaan-jaff commented 10 months ago

Auto select the best LLM for a given task

If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light

Pipboyguy commented 10 months ago

Integration with NLP Cloud

krrishdholakia commented 10 months ago

That's awesome @Pipboyguy - dm'ing on linkedin to learn more!

krrishdholakia commented 9 months ago

@ishaan-jaff check out this truncate param in the cohere api

This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner.

Screenshot 2023-09-14 at 10 54 50 AM

I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well).

haseeb-heaven commented 9 months ago

Option to use Inference API so we can use any model from Hugging Face 🤗

krrishdholakia commented 9 months ago

@haseeb-heaven you can already do this - https://github.com/BerriAI/litellm/blob/a63784d5b376c22d6203fed62f26c3ec5f92e5d1/litellm/llms/huggingface_restapi.py#L53

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 
haseeb-heaven commented 9 months ago

@haseeb-heaven you can already do this -

https://github.com/BerriAI/litellm/blob/a63784d5b376c22d6203fed62f26c3ec5f92e5d1/litellm/llms/huggingface_restapi.py#L53

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

Wow great thanks its working. Nice feature

smig23 commented 9 months ago

Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private.

ishaan-jaff commented 9 months ago

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

shauryr commented 9 months ago

finetuning wrapper for openai, huggingface etc.

krrishdholakia commented 9 months ago

@shauryr i created an issue to track this - feel free to add any missing details here

smig23 commented 9 months ago

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future.

ranjancse26 commented 9 months ago

I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well?

https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

ranjancse26 commented 9 months ago

I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc.

Sorry, based on the below documentation, it seems there's only support for the Open AI embedding.

https://docs.litellm.ai/docs/embedding/supported_embedding

ranjancse26 commented 9 months ago

I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models.

https://docs.cerebrium.ai/cerebrium/prebuilt-models

ishaan-jaff commented 9 months ago

@ranjancse26 what models on cerebrium do you want to use with LiteLLM ?

ranjancse26 commented 9 months ago

@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing.

image

ishaan-jaff commented 9 months ago

@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals

ranjancse26 commented 9 months ago

I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere.

https://docs.cohere.com/reference/post_chat

ranjancse26 commented 9 months ago

I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support.

ranjancse26 commented 9 months ago

I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.

ranjancse26 commented 9 months ago

I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.

ranjancse26 commented 9 months ago

I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation.

https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser https://colab.research.google.com/drive/1Q9VvVzjZJja7_y2Ls8qBkE_NApbLiqly?usp=sharing

krrishdholakia commented 9 months ago

Hey @ranjancse26 we have support for both llama index and langchain. Which have great vector db support. Any reason why those don't work for you?

ranjancse26 commented 9 months ago

@krrishdholakia @ishaan-jaff Could you please mention detailed references to the vector db usages with code samples on how one could leverage with Litellm?

krrishdholakia commented 9 months ago

Here's a sample code @ranjancse26

from litellm import completion 

prompt = # prompt injected with data from vector db retrieval 

messages = [{"role": "user", "content": prompt}]

response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)

Is there some nuance here i'm missing? Our vector db implementations usually involved stuffing the prompt with some additional context.

ranjancse26 commented 9 months ago

@krrishdholakia Sorry, that's not what I expected. Please take a look into this open-source project - https://github.com/abhishek-ch/VectorVerse

ranjancse26 commented 9 months ago

Regarding the auto selection of models, Open Router has the option. I believe, this will be an amazing feature to integrate as part of the LiteLLM.

image image

krrishdholakia commented 9 months ago

tracking this request here - @ranjancse26 https://github.com/BerriAI/litellm/issues/421

ranjancse26 commented 9 months ago

I wish Litellm has the "Enterprise Vision" to support on the multi-tenant requirements. Here's what happens with any organization who wishes to integrate or use the LiteLLM.

  1. Small to mid-sized projects are ok. No issues with the direct integration
  2. For large scale or enterprise apps or even a SaaS based platform for that matter, we wish the LiteLLM to have the capabilities to support for the secure key management, user, roles & permission management, cost, token usages and capturing all the metrics etc. This will allow the organizations to easily plug and playthings. It's like a fully bundled package.

Apologize if I am expecting too much from the LiteLLM perspective.

krrishdholakia commented 9 months ago

@ranjancse26 We'd be really happy to support that scenario. Is this a current requirement for you?

ranjancse26 commented 9 months ago

@krrishdholakia Yes and that would be a great feature too.

krrishdholakia commented 9 months ago

@ranjancse26 I've created 2 issues to help track this

Please feel free to add additional details.

ranjancse26 commented 9 months ago

I wish, the LiteLLM has an inbuild support for toxic content classification. The following are the categorical classifications at high-level. "detoxify" is a generic solution which one could decorate as part of the LiteLLM calls. It's quite similar to how the moderations works but doesn't depend upon the Open AI.

https://github.com/unitaryai/detoxify

ranjancse26 commented 9 months ago

I wish the LiteLLM has an integration capability with the "psychic". Currently, it supports langchain and I see, there could be a greater potential with the litellm support.

https://www.psychic.dev/

Psychic is an open source data integration platform for large language models (LLMs). Psychic includes full OAuth flows for 10+ data sources, transforms data from each source into vector store optimized Documents, and handles data syncs automatically. Psychic is designed to work with applications that use LangChain, but can integrate with most other tech stacks.

krrishdholakia commented 9 months ago

Hey @ranjancse26 re: toxic content - any reason you don't want to use the openai moderations endpoint?

And -- why does this matter to you?

ranjancse26 commented 9 months ago

@krrishdholakia Open AI moderations are great, however there's a hard dependency on the Open AI. How about a generic solution which works for any LLM provider? detoxify is just an example on how we could leverage the content moderation without having to depend upon a single provider.

ranjancse26 commented 9 months ago

I wish, the LiteLLM has the capability by having a configurable module for handling the private or sensitive data before the prompts are being sent to the LLMs. Here's an idea which could be explored and integrated. Basically, it has the pre and post processing aspects that needs to be dealt with.

https://opaqueprompts.opaque.co/

Protect your sensitive data from model providers. Leverage LLMs, privately.

Pre-process LLM inputs to hide sensitive data in your prompts from LLM providers. Post-process LLM responses to replace all sanitized tokens with the original sensitive information.

krrishdholakia commented 9 months ago

oh - why can't you just clean this before using litellm

prompt = # scrub prompt 

response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}])

@ranjancse26

ranjancse26 commented 9 months ago

@krrishdholakia Apologize if things weren't fully clear. I will try my best to answer the open security or privacy concerns yet to solve the most common problems with the sensitive data.

Generally, when dealing with the private or sensitive data, it's natural for organizations to have a fear of sending them directly to the public LLMs. Hence, the need for the proxy or the middleman to take care of the aspects of not only masking the sensitive info but also stub it with the necessary data once the LLMs returns it. That way, things will be more seamless for the consumers.

Wouldn't it be a nice service or an add-on to the LiteLLM proxy to handle these cross-cutting concerns?

Please let me know your thoughts?

shauryr commented 9 months ago

support for DeepInfra. This is the easiest and cheapest way to get llama 2 running for your system. They support openai api right now - https://deepinfra.com/meta-llama/Llama-2-70b-chat-hf/api?example=openai-python I imagine calling it via litellm would be better.

krrishdholakia commented 9 months ago

@shauryr isn't it just

from litellm import completion 

messages=[{"role":"user", "content": "Hey"}]

response = completion(model="openai/meta-llama/Llama-2-70b-chat-hf", messages, api_key="<YOUR DEEPINFRA TOKEN>", api_base="https://api.deepinfra.com/v1/openai")

print(response) 

How can we help further?

shauryr commented 9 months ago

Yes. But I wanted to use it with llama-index via litellm. Any thoughts on that?

On Mon, Sep 25, 2023 at 10:32 PM Krish Dholakia @.***> wrote:

@shauryr https://github.com/shauryr isn't it just

from litellm import completion

messages=[{"role":"user", "content": "Hey"}]

response = completion(model="openai/meta-llama/Llama-2-70b-chat-hf", messages, api_key="", api_base="https://api.deepinfra.com/v1/openai")

print(response)

— Reply to this email directly, view it on GitHub https://github.com/BerriAI/litellm/issues/361#issuecomment-1734772212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAFLTB4PE3FTBUTGPN5SU3X4JEFRANCNFSM6AAAAAA4W7JAQM . You are receiving this because you were mentioned.Message ID: @.***>

krrishdholakia commented 9 months ago

Are you seeing an error with it? This should work without changes

shauryr commented 9 months ago

https://github.com/jerryjliu/llama_index/issues/7824 - Have a look at this issue that I created.

abhinavkulkarni commented 9 months ago

Create a proxy service that acts as a translator for various backends - TGI, Llama.cpp, etc. and returns responses that are OpenAI API compatible. A user should be able to spin the service up locally. This will help users use various products and services by simply modifying OPENAI_API_BASE.