BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.23k stars 1.42k forks source link

[Feature]: Start LiteLLM Proxy with a defined budget per user #1677

Closed ishaan-jaff closed 7 months ago

ishaan-jaff commented 7 months ago

The Feature

It would be helpful if BudgetManager integrated with the proxy budget management functionality. From an API perspective the BudgetManager class is a nice lightweight way to manage budgets, but currently it seems you need to choose between either writing your own persistence API, or not using BudgetManager and instead calling the proxy directly to manage both user accounts and keys, which is a fair step up in complexity. Perhaps the neatest way of handling this would be allowing a client with the master API key to make requests directly on behalf of users (without a user API key). Then a BudgetManager class could handle managing the users and setting/fetching the budgets - update_cost would be a no-op of course, since the proxy handles that itself.

Motivation, pitch

request from @grugnog

Twitter / LinkedIn details

No response

ishaan-jaff commented 7 months ago

@grugnog I don't think I fully understood this

ishaan-jaff commented 7 months ago

bump @grugnog Can we get some more details? I can work on this today

grugnog commented 7 months ago

Thanks @ishaan-jaff

Yes, in our case the application manages budgets fully on behalf of users and really just needs tracking functionality. The users aren't using the API directly at all and creating/managing API keys would just add complexity without any real benefit.

I think the existing BudgetManager interface works pretty much exactly how we need. The main gap is just persistence - our app is otherwise stateless, and it would be nice to be able to use the proxy for storing the budget state, rather than adding another database or API with persistence just for this.

If we are able to use the proxy, the other benefit is that potentially it would allow multiple applications to share users and budgets, so that they could be managed in one place.

ishaan-jaff commented 7 months ago

@grugnog thanks for the response would this work?

Start LiteLLM Proxy with a defined budget per user


litellm_settings:
  # other litellm settings
  max_user_budget: 0 # (float) sets max budget as $0 USD
  budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

send user in /chat/completion calls to LiteLLM Proxy

curl --location 'http://0.0.0.0:8000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
   "user": "ishaan",
}'

The litellm proxy already tracks cost per user when it's passed as user to any endpoint

calls fail when user crossed their defined budget

raise Exception("ishaan crossed their budget") 
grugnog commented 7 months ago

@ishaan-jaff ahh, thank you! I think that should work - I am not sure this is covered in the documentation currently. Do users need to be created in advance, or is just keeping a consistent user key sufficient?

ishaan-jaff commented 7 months ago

They do not need to be created in advance - is that what you want ?

ishaan-jaff commented 7 months ago

to clarify - we'll need to make some changes on our side to enable this, which I'll get done today

ishaan-jaff commented 7 months ago

PR here: https://github.com/BerriAI/litellm/pull/1859 cc @grugnog

ishaan-jaff commented 7 months ago

done @grugnog https://github.com/BerriAI/litellm/releases/tag/v1.22.10

Any chance we can hop on a quick call - would love to learn more about your use case - so we solve your problem well, sharing a link to my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat?month=2024-02

grugnog commented 7 months ago

@ishaan-jaff thanks for looking at this - I have had a play with this. It does enable making requests "as" a user and the budget/spend in /spend/users look correct. However it doesn't seem to be giving me an out of budget error even when the user is beyond the defined budget. I tried with the master key as well as a key provisioned via the API (since I noticed the test does the latter, but no difference.

I set up a time to chat tomorrow!

ishaan-jaff commented 7 months ago

sounds good, will try debugging on my side before our call tomorrow. Thanks for setting up time