Open krrishdholakia opened 10 months ago
just added support for deep-infra @shauryr
import os
from litellm import completion
os.environ['DEEPINFRA_API_KEY'] = ""
model_name = "deepinfra/meta-llama/Llama-2-70b-chat-hf"
try:
response = completion(model=model_name, messages=messages)
# Add any assertions here to check the response
print(response)
Waiting for a new deploy of litellm - will lyk as soon as it's deployed
litellm 0.1.758 has the update for deepinfra @shauryr @ranjancse26
I just verified this code snippet works:
import os
from litellm import completion
os.environ['DEEPINFRA_API_KEY'] = ""
model_name = "deepinfra/meta-llama/Llama-2-70b-chat-hf"
response = completion(
model=model_name,
messages=[{"role": "user", "content": "Hello"}]
)
# Add any assertions here to check the response
print(response)
Would love to see integration with Haystack.
Haystack is used a lot in production environments.
@gururise is this the api you want to integrate with - https://docs.haystack.deepset.ai/docs/rest_api#querying-the-haystack-rest-api
@gururise is this the api you want to integrate with - https://docs.haystack.deepset.ai/docs/rest_api#querying-the-haystack-rest-api
Would love to see LiteLLM integration into Haystack's PromptNode: https://docs.haystack.deepset.ai/docs/prompt_node
This would make LiteLLM a first class citizen in the Haystack library. Haystack is used a lot in production environments.
tracking this here @gururise https://github.com/BerriAI/litellm/issues/591
Can it have a integration with the ChatGLM2 at bigmodels.cn
CORS support. Right now web frontends like TypingMind cannot talk to my local endpoint due to lack of CORS headers:
Browser error:
Fetch API cannot load http://localhost:8000/chat/completions due to access control checks.
I'm using the docker image like so: docker run --name ollama -p 8000:8000 litellm/ollama
Hey @calvinweb isn't this already possible through our support for the huggingface inference endpoint format - https://huggingface.co/THUDM/chatglm-6b.
@DanielLaberge thanks for letting us know - i'll file an issue to track this and look into it today.
tracking here - https://github.com/BerriAI/litellm/issues/595 @DanielLaberge
Hey @calvinweb isn't this already possible through our support for the huggingface inference endpoint format - https://huggingface.co/THUDM/chatglm-6b.
I means the commercial one at bigmodel.cn
the docker example that you showed in the documentation is built for openai-proxy not for routing. wish there is an API docker-based where its do routing. @krrishdholakia
Hey @alabrashJr what would that look like? Would you put all the deployments on the server and then just pass the server a name?
What's the ideal way to pass the list of deployments to the server?
Also @alabrashJr reached out via linkedin to understand this better. Let me know if you prefer that / discord for discussing further.
Linkedin: https://www.linkedin.com/in/krish-d/ Discord: Krrish#8748
@krrishdholakia what I meant is creating a load balance API across multiple Azure/OpenAI deployments. that picks and uses the deployment that is below rate-limit and has the least amount of tokens used.
What: Add support for OpenAI's echo
parameter.
Why: Frameworks like lm-evaluation-harness rely on the echo
parameter to get the logprobs of the prompt tokens. This could allow for a significant speed-up of lm-eval
evaluation by using TGI's echo
equivalent decoder_input_details
. Though not all back-ends support it, it could also enable much easier comparisons of different model providers!
Bonus: TGI also supports top_n_tokens
, which can return the log prob of the most likely tokens at each timestep, semi-equivalent of OpenAI's logprobs
parameter.
@Vinno97 thanks for raising this issue, tracking it here: https://github.com/BerriAI/litellm/issues/699
@alabrashJr tracking the docker image issue here: https://github.com/BerriAI/litellm/issues/696
Integrate with pezzo.ai - Observability, Cost & Prompt Engineering Platform
Pezzo is a powerful open-source toolkit designed to streamline the process of AI development. It empowers developers and teams to leverage the full potential of AI models in their applications with ease.
Support for Ollama embeddings
Auto-discover Ollama's list of models instead of requiring each declared in proxy config. As a workaround, I wrote this script to build the proxy config from the list.
import requests
import yaml
import copy
# Fetch the list of models
response = requests.get('http://ollama.private/api/tags')
models = [model['name'] for model in response.json()['models']]
# Define the template
template = {
"model_name": "MODEL",
"litellm_params": {
"model": "MODEL",
"api_base": "http://ollama:11434",
"stream": False
}
}
# Build the model_list
model_list = []
for model in models:
new_item = copy.deepcopy(template)
new_item['model_name'] = model
new_item['litellm_params']['model'] = f"ollama/{model}"
model_list.append(new_item)
litellm_config = {
"model_list": model_list
}
# Print the result
print(yaml.dump(litellm_config))
Hey @rupurt we're planning on adding ollama embeddings this week - https://github.com/BerriAI/litellm/issues/1193
will update you once it's out.
Hey @qrkourier yes - this is a known issue - https://github.com/BerriAI/litellm/issues/979
@qrkourier DM'ed via linkedin to understand your scenario better
proxy with the option to specify which endpoints to expose (e.g. no key management, no write API)
Hey @bufferoverflow can you help me understand what 'expose' means here?
All the endpoints are protected behind user_api_key_auth, and certain routes can only be used (E.g. key management) with a master key - https://github.com/BerriAI/litellm/blob/c34246bdc8d9b633058d230ad945c1659e71f5ba/litellm/proxy/proxy_server.py#L218
thanks @krrishdholakia Maybe this solves that already, we have to investigate
I wish there was support for AWS Bedrock Agents, so I can LiteLLM to query a knowledge base.
@adamrb curious - why do you need litellm here?
@adamrb curious - why do you need litellm here?
I use Bedrock's foundational models through LibreChat with LiteLLM acting as a proxy. There's no way to use a Bedrock knowledge base when accessed in this way.
I wish LITELLM had openai image generation
@guiramos we do - https://docs.litellm.ai/docs/image_generation
Where in docs did you look?
Ahhh didn't see that! Thank you very @krrishdholakia
I wish support Alibabaβs Tongyi Qwen and Baidu Qianfan models.
I would like to see cache hits within langfuse. Is this possible with tags? It would also be nice if langfuse did not count cache hits with cost. In addition, it would be nice to see how much money is saved by implementing cache.
@bsu3338 added on this PR: https://github.com/BerriAI/litellm/pull/1519
@ishaan-jaff thank you so much, I will try it out later today.
@krrishdholakia Have you ever thought about adding ready langchain / llama_index callbacks the OpenAI proxy, so it's easy for developer to run a docker container that has the desired agent or chain defined and served in OpenAI style
Also I'd like to see compatibility with semantic caching like https://github.com/zilliztech/GPTCache
A wrapper that would take a openai api call for image generation and then make the request to stable diffusion with a custom set of parameters. All I know about is automatic1111, but would be good with whatever you think best. Current stable diffusion access required a plugin, this would remove the need for a plugin. Example: https://github.com/danny-avila/LibreChat/blob/main/api/app/clients/tools/StableDiffusion.js
Thinking of /sdapi/v1/txt2img https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/dev/modules/api/api.py
https://faun.pub/stable-diffusion-enabling-api-and-how-to-run-it-a-step-by-step-guide-7ebd63813c22
@ishaan-jaff thanks for the cache hits update. I did a PR on librechat documenting the feature: https://github.com/danny-avila/LibreChat/pull/1618/files
I wish there were an arm64 build of ghcr.io/berriai/litellm:main
Hi @bsu3338 unable to find the api documentation for automatic1111, can you point me to it?
@giyaseddin discussing this here - https://github.com/BerriAI/litellm/issues/1541
@bgoosmanviz tracking this here - https://github.com/BerriAI/litellm/issues/1607, would it be possible to support in the normal dockerfile or would we need a separate dockerfile for this?
@krrishdholakia I have not been able to find great documentation either except for the links I sent. I think the way that they want you to get access to the api documentation is running the stable diffusions web ui and then navigating to /docs. The best way I have seen to run it in docker is: https://github.com/AbdBarho/stable-diffusion-webui-docker
git clone https://github.com/AbdBarho/stable-diffusion-webui-docker.git
cd stable-diffusion-webui-docker/
docker compose --profile download up --build
# For CPU use docker compose --profile auto-cpu up --build
# For Nvidia GPU use docker compose --profile auto up --build
# Notice: for me I had to comment out the line # tty: true
cd data/models/Stable-diffusion
wget https://huggingface.co/stabilityai/sdxl-turbo/resolve/main/sd_xl_turbo_1.0_fp16.safetensors
cd ../../../
docker compose --profile auto up -d
https://github.com/AbdBarho/stable-diffusion-webui-docker/wiki/Usage
When I have a moment, I will see if I can create a sample api call to create an image through the api. What would else would be helpful?
You can post with something like the below to create an image with SDXL Turbo
{
prompt: 'realistic image of a cow jumping over the moon',
negative_prompt: '',
sd_model_name: 'sd_xl_turbo_1.0_fp16',
sampler_name: 'Euler a',
cfg_scale: 1,
steps: 1,
width: 512,
height: 512
}
This should spin the up the service running on port 7860 and then navigate to 127.0.0.1:7860/docs
/sdapi/v1/txt2img POST
{
"prompt": "",
"negative_prompt": "",
"styles": [
"string"
],
"seed": -1,
"subseed": -1,
"subseed_strength": 0,
"seed_resize_from_h": -1,
"seed_resize_from_w": -1,
"sampler_name": "string",
"batch_size": 1,
"n_iter": 1,
"steps": 50,
"cfg_scale": 7,
"width": 512,
"height": 512,
"restore_faces": true,
"tiling": true,
"do_not_save_samples": false,
"do_not_save_grid": false,
"eta": 0,
"denoising_strength": 0,
"s_min_uncond": 0,
"s_churn": 0,
"s_tmax": 0,
"s_tmin": 0,
"s_noise": 0,
"override_settings": {},
"override_settings_restore_afterwards": true,
"refiner_checkpoint": "string",
"refiner_switch_at": 0,
"disable_extra_networks": false,
"comments": {},
"enable_hr": false,
"firstphase_width": 0,
"firstphase_height": 0,
"hr_scale": 2,
"hr_upscaler": "string",
"hr_second_pass_steps": 0,
"hr_resize_x": 0,
"hr_resize_y": 0,
"hr_checkpoint_name": "string",
"hr_sampler_name": "string",
"hr_prompt": "",
"hr_negative_prompt": "",
"sampler_index": "Euler",
"script_name": "string",
"script_args": [],
"send_images": true,
"save_images": false,
"alwayson_scripts": {}
}
OK, did not know I wanted it until you added Google SSO :) I would like OpenID authentication so I can configure it with AzureAD.
Tracking this here @bsu3338 https://github.com/BerriAI/litellm/issues/1658 for Azure AD
It would be helpful if BudgetManager integrated with the proxy budget management functionality. From an API perspective the BudgetManager class is a nice lightweight way to manage budgets, but currently it seems you need to choose between either writing your own persistence API, or not using BudgetManager and instead calling the proxy directly to manage both user accounts and keys, which is a fair step up in complexity. Perhaps the neatest way of handling this would be allowing a client with the master API key to make requests directly on behalf of users (without a user API key). Then a BudgetManager class could handle managing the users and setting/fetching the budgets - update_cost would be a no-op of course, since the proxy handles that itself.
@grugnog made an issue to discuss this, I had some follow up questions
Wow, now I am starting to feel guilty asking, because I have already made a couple of request. But would it be possible to route based on the prompt. If it is a coding prompt it gets routed to coddllama, if it is sql, the sql coder, if medical then a medical model... This way the user does not have to switch models to get the desired results. If it is a vision question go to llava...
@bsu3338 you should already be able to do this with Pre Call hooks on the proxy: https://docs.litellm.ai/docs/proxy/call_hooks Let me know if this solves your problem ?
(you'll need to define your own rules)
@bsu3338 wouldn't this be the same as making a function call to gpt to pick the model based on the question?
This is a ticket to track a wishlist of items you wish LiteLLM had.
COMMENT BELOW π
With your request π₯ - if we have any questions, we'll follow up in comments / via DMs
Respond with β€οΈ to any request you would also like to see
P.S.: Come say hi π on the Discord