-
When sending the command 'run 1', too many calls to the openai API resulted in access being denied. The error is as follows:
openai.error.RateLimitError: Rate limit reached for default-text-embeddi…
-
### The Feature
The normal vllm server, supports these inputs. By supporting this, we can handle prompt formatting on the proxy instead of the individual vllm server
### Motivation, pitch
Use…
-
### Is your feature request related to a problem? Please describe.
when running my model with vLLM interpreter --api_base https://xxxxxxxxxxxx-8000.proxy.runpod.net/v1 --model casperhansen/mixtral-i…
-
I was getting the RateLimitReached Error yesterday (after around the 80th generation each prompt is around 10000 tokens). My simple workaround is below, but is there a better way?
``` python
def …
-
-
### Proposal
It would be nice to add Azure OpenAI Completions to our project. It's widely used in business, so it could attract more users and keep us ahead as AI and ML become standard. Besides, imp…
-
### What happened?
I followed the [documentation](https://docs.litellm.ai/docs/proxy/deploy#quick-start) to deploy litellm locally and imported Claude 3.5. Ordinary conversations are running smoothly…
-
### What happened?
I set `litellm.set_verbose = False` and before calling completion I have a log file set up to monitor other activities.
`logging.basicConfig(filename=f"{os.path.splitext(os.p…
-
Hey @srush,
saw that you're using manifest for making OpenAI/LLM calls vs. calling it yourself - why is that?
https://github.com/srush/MiniChain/blob/b79ebc51bdedb836c9265eec2fcc21cd60b17327/min…
-
Also, check the body for hyperparams that liteLLM / providers support, and use those in chat completions response.