[Feature]: Add commercial rate limits for models to model cost map

BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

https://docs.litellm.ai/docs/

Other

13.93k stars 1.64k forks source link

[Feature]: Add commercial rate limits for models to model cost map #698

Open krrishdholakia opened 1 year ago

krrishdholakia commented 1 year ago

The Feature

providers have known commercial rate limits - e.g.: https://docs.perplexity.ai/docs/rate-limits

Add this to the model cost map for each model/provider.

Motivation, pitch

We should add this to the model cost map to make it easier to access / use this information.

Twitter / LinkedIn details

No response

krrishdholakia commented 1 year ago

it'll be important to allow users to override this though, as different users might have different rate limits for providers (E.g. our openai rate limit).

Maybe call it default_rate_limit or something like that.

krrishdholakia commented 1 year ago

cc: @toniengelhardt / @jsherer thoughts on this? since y'all use the model cost map

krrishdholakia commented 1 year ago

for anyscale: https://d3qavlo5goipcw.cloudfront.net/guides/models#rate-limiting-and-concurrent-queries

toniengelhardt commented 1 year ago

cc: @toniengelhardt / @jsherer thoughts on this? since y'all use the model cost map

Epic! From my end, the more info the better.

jsherer commented 1 year ago

I can imagine that being helpful so long as you leave the actual rate limiting to the users of the library.

Could it be useful to place the master copy of these jsons in a separate repo (litellm-data?) to avoid any lengthy review process of cutting a new litellm version when updates need to be made?

krrishdholakia commented 1 year ago

@jsherer updating the model json is independent of publishing a new litellm version.

It's done via updating the .json (for contributors it's via PR) - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

Since this is already decoupled, was there something further you were hoping to achieve here?

krrishdholakia commented 1 year ago

I think this is a good breakdown of model info for a model:

price
rate limits
max tokens
speed

https://github-com.translate.goog/maritaca-ai/maritalk-api?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp#aspectos-t%C3%A9cnicos