Open mcavaliere opened 1 year ago
I am also facing this issue. Any resolutions?
I found this but it did not help very much.
@PicoCreator thoughts?
You can configure "provider rate limit" in the generated config file : https://github.com/PicoCreator/smol-dev-js/blob/ba496cb20440654a32287015645e3852615f5716/src/core/config.js#L38C7-L38C24
And it should help mitigate the issue - or alternatively switch to gpt3.5 which has higher rate limit.
For most part, as OpenAI clamps down on gpt4 rate limit more, this might be an issue that might not be resolvable if using gpt4
Hey @mcavaliere @nshmadhani @PicoCreator, I'm the maintainer of LiteLLM. Our openai proxy has fallbacks which could help here - if gpt-4 rate limits are reached, fallback to gpt-3.5-turbo. You can also use it to just load balance across multiple azure gpt-4 instances:
Step 1: Put your instances in a config.yaml
model_list:
model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <my-openai-key>
- model_name: gpt-3.5-turbo-16k
litellm_params:
model: gpt-3.5-turbo-16k
api_key: <my-openai-key>
litellm_settings:
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
request_timeout: 10 # raise Timeout error if call takes longer than 10s
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]
Step 2: Install LiteLLM
$ pip install litellm
Step 3: Start litellm proxy w/ config.yaml
$ litellm --config /path/to/config.yaml
Docs: https://docs.litellm.ai/docs/simple_proxy
Would this help out in your scenario?
Hello! Thanks for creating a cool port of a cool lib.
I'm trying to get it running and am hitting rate limits, see below. This seems to happen no matter what prompt I run.