BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
11.63k stars 1.34k forks source link

[Feature]: Add commercial rate limits for models to model cost map #698

Open krrishdholakia opened 10 months ago

krrishdholakia commented 10 months ago

The Feature

providers have known commercial rate limits - e.g.: https://docs.perplexity.ai/docs/rate-limits

Add this to the model cost map for each model/provider.

Motivation, pitch

We should add this to the model cost map to make it easier to access / use this information.

Twitter / LinkedIn details

No response

krrishdholakia commented 10 months ago

it'll be important to allow users to override this though, as different users might have different rate limits for providers (E.g. our openai rate limit).

Maybe call it default_rate_limit or something like that.

krrishdholakia commented 10 months ago

cc: @toniengelhardt / @jsherer thoughts on this? since y'all use the model cost map

krrishdholakia commented 10 months ago

for anyscale: https://d3qavlo5goipcw.cloudfront.net/guides/models#rate-limiting-and-concurrent-queries

toniengelhardt commented 10 months ago

cc: @toniengelhardt / @jsherer thoughts on this? since y'all use the model cost map

Epic! From my end, the more info the better.

jsherer commented 10 months ago

I can imagine that being helpful so long as you leave the actual rate limiting to the users of the library.

Could it be useful to place the master copy of these jsons in a separate repo (litellm-data?) to avoid any lengthy review process of cutting a new litellm version when updates need to be made?

krrishdholakia commented 10 months ago

@jsherer updating the model json is independent of publishing a new litellm version.

It's done via updating the .json (for contributors it's via PR) - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

Since this is already decoupled, was there something further you were hoping to achieve here?

krrishdholakia commented 10 months ago

I think this is a good breakdown of model info for a model:

https://github-com.translate.goog/maritaca-ai/maritalk-api?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp#aspectos-t%C3%A9cnicos