krrishdholakia commented 10 months ago

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

ishaan-jaff commented 10 months ago

just added support for deep-infra @shauryr

import os
from litellm import completion

os.environ['DEEPINFRA_API_KEY'] = "" 
model_name = "deepinfra/meta-llama/Llama-2-70b-chat-hf"
try:
    response = completion(model=model_name, messages=messages)
    # Add any assertions here to check the response
    print(response)

Waiting for a new deploy of litellm - will lyk as soon as it's deployed

ishaan-jaff commented 10 months ago

litellm 0.1.758 has the update for deepinfra @shauryr @ranjancse26

I just verified this code snippet works:

import os
from litellm import completion

os.environ['DEEPINFRA_API_KEY'] = "" 
model_name = "deepinfra/meta-llama/Llama-2-70b-chat-hf"

response = completion(
    model=model_name, 
    messages=[{"role": "user", "content": "Hello"}]
)
# Add any assertions here to check the response
print(response)

gururise commented 9 months ago

Would love to see integration with Haystack.

Haystack is used a lot in production environments.

krrishdholakia commented 9 months ago

@gururise is this the api you want to integrate with - https://docs.haystack.deepset.ai/docs/rest_api#querying-the-haystack-rest-api

gururise commented 9 months ago

@gururise is this the api you want to integrate with - https://docs.haystack.deepset.ai/docs/rest_api#querying-the-haystack-rest-api

Would love to see LiteLLM integration into Haystack's PromptNode: https://docs.haystack.deepset.ai/docs/prompt_node

This would make LiteLLM a first class citizen in the Haystack library. Haystack is used a lot in production environments.

krrishdholakia commented 9 months ago

tracking this here @gururise https://github.com/BerriAI/litellm/issues/591

lin-calvin commented 9 months ago

Can it have a integration with the ChatGLM2 at bigmodels.cn

DanielLaberge commented 9 months ago

CORS support. Right now web frontends like TypingMind cannot talk to my local endpoint due to lack of CORS headers:

Browser error: Fetch API cannot load http://localhost:8000/chat/completions due to access control checks.

I'm using the docker image like so: docker run --name ollama -p 8000:8000 litellm/ollama

krrishdholakia commented 9 months ago

Hey @calvinweb isn't this already possible through our support for the huggingface inference endpoint format - https://huggingface.co/THUDM/chatglm-6b.

krrishdholakia commented 9 months ago

@DanielLaberge thanks for letting us know - i'll file an issue to track this and look into it today.

krrishdholakia commented 9 months ago

tracking here - https://github.com/BerriAI/litellm/issues/595 @DanielLaberge

lin-calvin commented 9 months ago

Hey @calvinweb isn't this already possible through our support for the huggingface inference endpoint format - https://huggingface.co/THUDM/chatglm-6b.

I means the commercial one at bigmodel.cn

alabrashJr commented 9 months ago

the docker example that you showed in the documentation is built for openai-proxy not for routing. wish there is an API docker-based where its do routing. @krrishdholakia

krrishdholakia commented 9 months ago

Hey @alabrashJr what would that look like? Would you put all the deployments on the server and then just pass the server a name?

What's the ideal way to pass the list of deployments to the server?

Also @alabrashJr reached out via linkedin to understand this better. Let me know if you prefer that / discord for discussing further.

Linkedin: https://www.linkedin.com/in/krish-d/ Discord: Krrish#8748

alabrashJr commented 9 months ago

@krrishdholakia what I meant is creating a load balance API across multiple Azure/OpenAI deployments. that picks and uses the deployment that is below rate-limit and has the least amount of tokens used.

Vinno97 commented 9 months ago

What: Add support for OpenAI's echo parameter.

Why: Frameworks like lm-evaluation-harness rely on the echo parameter to get the logprobs of the prompt tokens. This could allow for a significant speed-up of lm-eval evaluation by using TGI's echo equivalent decoder_input_details. Though not all back-ends support it, it could also enable much easier comparisons of different model providers!

Bonus: TGI also supports top_n_tokens , which can return the log prob of the most likely tokens at each timestep, semi-equivalent of OpenAI's logprobs parameter.

ishaan-jaff commented 9 months ago

@Vinno97 thanks for raising this issue, tracking it here: https://github.com/BerriAI/litellm/issues/699

krrishdholakia commented 9 months ago

@alabrashJr tracking the docker image issue here: https://github.com/BerriAI/litellm/issues/696

ranjancse26 commented 7 months ago

Integrate with pezzo.ai - Observability, Cost & Prompt Engineering Platform

Pezzo is a powerful open-source toolkit designed to streamline the process of AI development. It empowers developers and teams to leverage the full potential of AI models in their applications with ease.

rupurt commented 7 months ago

Support for Ollama embeddings

qrkourier commented 7 months ago

Auto-discover Ollama's list of models instead of requiring each declared in proxy config. As a workaround, I wrote this script to build the proxy config from the list.

import requests
import yaml
import copy

# Fetch the list of models
response = requests.get('http://ollama.private/api/tags')
models = [model['name'] for model in response.json()['models']]

# Define the template
template = {
  "model_name": "MODEL",
  "litellm_params": {
    "model": "MODEL",
    "api_base": "http://ollama:11434",
    "stream": False
  }
}

# Build the model_list
model_list = []
for model in models:
    new_item = copy.deepcopy(template)
    new_item['model_name'] = model
    new_item['litellm_params']['model'] = f"ollama/{model}"
    model_list.append(new_item)

litellm_config = {
    "model_list": model_list
}
# Print the result
print(yaml.dump(litellm_config))

krrishdholakia commented 7 months ago

Hey @rupurt we're planning on adding ollama embeddings this week - https://github.com/BerriAI/litellm/issues/1193

will update you once it's out.

krrishdholakia commented 7 months ago

Hey @qrkourier yes - this is a known issue - https://github.com/BerriAI/litellm/issues/979

@qrkourier DM'ed via linkedin to understand your scenario better

bufferoverflow commented 7 months ago

proxy with the option to specify which endpoints to expose (e.g. no key management, no write API)

krrishdholakia commented 7 months ago

Hey @bufferoverflow can you help me understand what 'expose' means here?

All the endpoints are protected behind user_api_key_auth, and certain routes can only be used (E.g. key management) with a master key - https://github.com/BerriAI/litellm/blob/c34246bdc8d9b633058d230ad945c1659e71f5ba/litellm/proxy/proxy_server.py#L218

bufferoverflow commented 7 months ago

thanks @krrishdholakia Maybe this solves that already, we have to investigate

adamrb commented 6 months ago

I wish there was support for AWS Bedrock Agents, so I can LiteLLM to query a knowledge base.

krrishdholakia commented 6 months ago

@adamrb curious - why do you need litellm here?

adamrb commented 6 months ago

@adamrb curious - why do you need litellm here?

I use Bedrock's foundational models through LibreChat with LiteLLM acting as a proxy. There's no way to use a Bedrock knowledge base when accessed in this way.

guiramos commented 6 months ago

I wish LITELLM had openai image generation

krrishdholakia commented 6 months ago

@guiramos we do - https://docs.litellm.ai/docs/image_generation

Where in docs did you look?

guiramos commented 6 months ago

Ahhh didn't see that! Thank you very @krrishdholakia

langgg0511 commented 6 months ago

I wish support Alibaba’s Tongyi Qwen and Baidu Qianfan models.

bsu3338 commented 6 months ago

I would like to see cache hits within langfuse. Is this possible with tags? It would also be nice if langfuse did not count cache hits with cost. In addition, it would be nice to see how much money is saved by implementing cache.

ishaan-jaff commented 6 months ago

@bsu3338 added on this PR: https://github.com/BerriAI/litellm/pull/1519

bsu3338 commented 6 months ago

@ishaan-jaff thank you so much, I will try it out later today.

giyaseddin commented 6 months ago

@krrishdholakia Have you ever thought about adding ready langchain / llama_index callbacks the OpenAI proxy, so it's easy for developer to run a docker container that has the desired agent or chain defined and served in OpenAI style

giyaseddin commented 6 months ago

Also I'd like to see compatibility with semantic caching like https://github.com/zilliztech/GPTCache

bsu3338 commented 6 months ago

A wrapper that would take a openai api call for image generation and then make the request to stable diffusion with a custom set of parameters. All I know about is automatic1111, but would be good with whatever you think best. Current stable diffusion access required a plugin, this would remove the need for a plugin. Example: https://github.com/danny-avila/LibreChat/blob/main/api/app/clients/tools/StableDiffusion.js

Thinking of /sdapi/v1/txt2img https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/dev/modules/api/api.py

https://faun.pub/stable-diffusion-enabling-api-and-how-to-run-it-a-step-by-step-guide-7ebd63813c22

bsu3338 commented 6 months ago

@ishaan-jaff thanks for the cache hits update. I did a PR on librechat documenting the feature: https://github.com/danny-avila/LibreChat/pull/1618/files

bgoosmanviz commented 6 months ago

I wish there were an arm64 build of ghcr.io/berriai/litellm:main

krrishdholakia commented 6 months ago

Hi @bsu3338 unable to find the api documentation for automatic1111, can you point me to it?

@giyaseddin discussing this here - https://github.com/BerriAI/litellm/issues/1541

@bgoosmanviz tracking this here - https://github.com/BerriAI/litellm/issues/1607, would it be possible to support in the normal dockerfile or would we need a separate dockerfile for this?

bsu3338 commented 6 months ago

@krrishdholakia I have not been able to find great documentation either except for the links I sent. I think the way that they want you to get access to the api documentation is running the stable diffusions web ui and then navigating to /docs. The best way I have seen to run it in docker is: https://github.com/AbdBarho/stable-diffusion-webui-docker

git clone https://github.com/AbdBarho/stable-diffusion-webui-docker.git
cd stable-diffusion-webui-docker/
docker compose --profile download up --build
# For CPU use docker compose --profile auto-cpu up --build
# For Nvidia GPU use docker compose --profile auto up --build

# Notice: for me I had to comment out the line #  tty: true

cd data/models/Stable-diffusion
wget https://huggingface.co/stabilityai/sdxl-turbo/resolve/main/sd_xl_turbo_1.0_fp16.safetensors

cd ../../../
docker compose --profile auto up -d

https://github.com/AbdBarho/stable-diffusion-webui-docker/wiki/Usage

When I have a moment, I will see if I can create a sample api call to create an image through the api. What would else would be helpful?

You can post with something like the below to create an image with SDXL Turbo

{
  prompt: 'realistic image of a cow jumping over the moon',
  negative_prompt: '',
  sd_model_name: 'sd_xl_turbo_1.0_fp16',
  sampler_name: 'Euler a',
  cfg_scale: 1,
  steps: 1,
  width: 512,
  height: 512
}

This should spin the up the service running on port 7860 and then navigate to 127.0.0.1:7860/docs

/sdapi/v1/txt2img POST

{
  "prompt": "",
  "negative_prompt": "",
  "styles": [
    "string"
  ],
  "seed": -1,
  "subseed": -1,
  "subseed_strength": 0,
  "seed_resize_from_h": -1,
  "seed_resize_from_w": -1,
  "sampler_name": "string",
  "batch_size": 1,
  "n_iter": 1,
  "steps": 50,
  "cfg_scale": 7,
  "width": 512,
  "height": 512,
  "restore_faces": true,
  "tiling": true,
  "do_not_save_samples": false,
  "do_not_save_grid": false,
  "eta": 0,
  "denoising_strength": 0,
  "s_min_uncond": 0,
  "s_churn": 0,
  "s_tmax": 0,
  "s_tmin": 0,
  "s_noise": 0,
  "override_settings": {},
  "override_settings_restore_afterwards": true,
  "refiner_checkpoint": "string",
  "refiner_switch_at": 0,
  "disable_extra_networks": false,
  "comments": {},
  "enable_hr": false,
  "firstphase_width": 0,
  "firstphase_height": 0,
  "hr_scale": 2,
  "hr_upscaler": "string",
  "hr_second_pass_steps": 0,
  "hr_resize_x": 0,
  "hr_resize_y": 0,
  "hr_checkpoint_name": "string",
  "hr_sampler_name": "string",
  "hr_prompt": "",
  "hr_negative_prompt": "",
  "sampler_index": "Euler",
  "script_name": "string",
  "script_args": [],
  "send_images": true,
  "save_images": false,
  "alwayson_scripts": {}
}

bsu3338 commented 5 months ago

OK, did not know I wanted it until you added Google SSO :) I would like OpenID authentication so I can configure it with AzureAD.

ishaan-jaff commented 5 months ago

Tracking this here @bsu3338 https://github.com/BerriAI/litellm/issues/1658 for Azure AD

grugnog commented 5 months ago

It would be helpful if BudgetManager integrated with the proxy budget management functionality. From an API perspective the BudgetManager class is a nice lightweight way to manage budgets, but currently it seems you need to choose between either writing your own persistence API, or not using BudgetManager and instead calling the proxy directly to manage both user accounts and keys, which is a fair step up in complexity. Perhaps the neatest way of handling this would be allowing a client with the master API key to make requests directly on behalf of users (without a user API key). Then a BudgetManager class could handle managing the users and setting/fetching the budgets - update_cost would be a no-op of course, since the proxy handles that itself.

ishaan-jaff commented 5 months ago

@grugnog made an issue to discuss this, I had some follow up questions

bsu3338 commented 5 months ago

Wow, now I am starting to feel guilty asking, because I have already made a couple of request. But would it be possible to route based on the prompt. If it is a coding prompt it gets routed to coddllama, if it is sql, the sql coder, if medical then a medical model... This way the user does not have to switch models to get the desired results. If it is a vision question go to llava...

ishaan-jaff commented 5 months ago

@bsu3338 you should already be able to do this with Pre Call hooks on the proxy: https://docs.litellm.ai/docs/proxy/call_hooks Let me know if this solves your problem ?

(you'll need to define your own rules)

krrishdholakia commented 5 months ago

@bsu3338 wouldn't this be the same as making a function call to gpt to pick the model based on the question?

BerriAI / litellm

🎅 I WISH LITELLM HAD... #361

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs