Closed gnovelli closed 1 month ago
The provided log indicates that there's an issue with the application hitting the rate limit of the OpenAI API. Here's a breakdown of the error:
models.py
.RateLimitError
.tenacity
library to retry the request when it fails. However, after multiple retries, it still couldn't succeed due to the rate limit, leading to a RetryError
.Recommendations:
Rate Limit Handling: Implement better rate limit handling in the application. This can be done by:
Optimize Requests: If possible, reduce the number of requests or the amount of data being sent in each request. This can help in staying within the rate limits.
Upgrade Plan: If the current rate limits are too restrictive for the application's needs, consider upgrading to a higher tier plan with OpenAI, if available.
Error Handling: Improve error handling to provide more informative messages to the end-users or developers. This can help in quickly identifying and resolving issues.
Monitoring and Alerts: Implement monitoring and alerting mechanisms to get notified when the application is nearing or has exceeded the rate limits. This can help in taking timely corrective actions.
Review Application Logic: Ensure that the application is not making unnecessary or redundant requests to the API. Optimizing the application logic can help in reducing the number of requests.
Remember, rate limits are in place to ensure fair usage and prevent abuse. It's essential to respect these limits and design applications accordingly.
you can set the OPENAI_API_BASE: "https://openai-forward.metadl.com/v1"
Very great software: Thank you for your effort!
But i have the same issue openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-oDQRbTYsPHkQqj21ou6TcfMk on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues. 2023-08-30 15:33:49.795 | INFO | metagpt.actions.write_code:run:77 - Writing tests.py.. Traceback (most recent call last): File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\tenacity_asyncio.py", line 50, in call result = await fn(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\Documents\WebApp\metagpt\metagpt\actions\write_code.py", line 71, in write_code code_rsp = await self._aask(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\Documents\WebApp\metagpt\metagpt\actions\action.py", line 47, in _aask return await self.llm.aask(prompt, system_msgs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\Documents\WebApp\metagpt\metagpt\provider\base_gpt_api.py", line 44, in aask rsp = await self.acompletion_text(message, stream=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\tenacity_asyncio.py", line 88, in async_wrapped return await fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\tenacity_asyncio.py", line 47, in call do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\tenacity__init.py", line 314, in iter return fut.result() ^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\concurrent\futures_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\tenacity_asyncio.py", line 50, in call__ result = await fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\Documents\WebApp\metagpt\metagpt\provider\openai_api.py", line 222, in acompletion_text return await self._achat_completion_stream(messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\Documents\WebApp\metagpt\metagpt\provider\openai_api.py", line 151, in _achat_completion_stream response = await openai.ChatCompletion.acreate(*self._cons_kwargs(messages), stream=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\openai\api_resources\chat_completion.py", line 45, in acreate return await super().acreate(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\openai\api_resources\abstract\engine_apiresource.py", line 217, in acreate response, , api_key = await requestor.arequest( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\openai\api_requestor.py", line 382, in arequest resp, got_stream = await self._interpret_async_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\openai\api_requestor.py", line 726, in _interpret_async_response self._interpret_response_line( File "C:\Users\ebudm\anaconda3\envs\metagpt\Lib\site-packages\openai\api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-oDQRbTYsPHkQqj21ou6TcfMk on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\ebudm\Documents\WebApp\metagpt\startup.py", line 42, in
Thank you Have a great day!
you can set the OPENAI_API_BASE: "https://openai-forward.metadl.com/v1"
how is this actually helping us? It will add a retry request to the error?
Remember, rate limits are in place to ensure fair usage and prevent abuse. It's essential to respect these limits and design applications accordingly.
We are actually paying for that service, there is no need to prevent abuse here!
I'm having this issue also. It's an unmodified install on docker using the snake example. Can I throttle the service calls in some way?
Instead of merely joining this discussion as another "me too" contributor, I aim to provide some additional insight. Based on my understanding of the code pertaining to rate limiting, it primarily measures requests per minute. This is evident here:
class RateLimiter:
"""Rate control class, each call goes through wait_if_needed, sleep if rate control is needed"""
def __init__(self, rpm):
self.last_call_time = 0
# Here 1.1 is used because even if the calls are made strictly according to time,
# they will still be QOS'd; consider switching to simple error retry later
self.interval = 1.1 * 60 / rpm
self.rpm = rpm
def split_batches(self, batch):
return [batch[i : i + self.rpm] for i in range(0, len(batch), self.rpm)]
async def wait_if_needed(self, num_requests):
current_time = time.time()
elapsed_time = current_time - self.last_call_time
if elapsed_time < self.interval * num_requests:
remaining_time = self.interval * num_requests - elapsed_time
logger.info(f"sleep {remaining_time}")
await asyncio.sleep(remaining_time)
self.last_call_time = time.time()
We could adjust the RPM
configuration variable to match our model. For GPT4 with 8192 tokens (max) and a TPM of 40,000, the maximum RPM = 4.88
. Setting RPM = 4
resolves issues. GPT4 API allows 200 RPM, giving maximum 40000/200 = 200
TPM if you wanted to go ludicrous speed with short context windows. Ideally, RPM and TPM should both be checked with some math to allow for the most efficient processing (e.g., dynamically adjust requests but stay without bounds), but for now self-calculation is a workaround.
Edit: Here is a hackish solution I made. Don't judge it, I know it's hackish, but I was more focused on a much larger task and was merely trying to get around the TPM issue as quickly as possible. It is in base_gpt_ai.py
. All I did was add a try
block with exponential back-off.
async def aask(self, msg: str, system_msgs: Optional[list[str]] = None) -> str:
if system_msgs:
message = self._system_msgs(system_msgs) + [self._user_msg(msg)]
else:
message = [self._default_system_msg(), self._user_msg(msg)]
# try until success
rsp = None
n=1
while rsp is None:
try:
rsp = await self.acompletion_text(message, stream=True)
except Exception as e:
logger.error(e)
logger.warning(f"Retrying after {n} seconds...")
time.sleep(n)
logger.warning("Retrying...")
# exponential backoff
n *= 2
I'm getting this error too
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 50, in __call__
result = await fn(*args, **kwargs)
File "/app/metagpt/metagpt/actions/write_code.py", line 71, in write_code
code_rsp = await self._aask(prompt)
File "/app/metagpt/metagpt/actions/action.py", line 50, in _aask
return await self.llm.aask(prompt, system_msgs)
File "/app/metagpt/metagpt/provider/base_gpt_api.py", line 44, in aask
rsp = await self.acompletion_text(message, stream=True)
File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 47, in __call__
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
return fut.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 50, in __call__
result = await fn(*args, **kwargs)
File "/app/metagpt/metagpt/provider/openai_api.py", line 238, in acompletion_text
return await self._achat_completion_stream(messages)
File "/app/metagpt/metagpt/provider/openai_api.py", line 163, in _achat_completion_stream
response = await openai.ChatCompletion.acreate(**self._cons_kwargs(messages), stream=True)
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 45, in acreate
return await super().acreate(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 217, in acreate
response, _, api_key = await requestor.arequest(
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 382, in arequest
resp, got_stream = await self._interpret_async_response(result, stream)
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 726, in _interpret_async_response
self._interpret_response_line(
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.RateLimitError: Rate limit reached for gpt-4 in organization org-LMNXm29ctg5WfXzqc11KVlgP on tokens per min. Limit: 10000 / min. Please try again in 6ms. Visit https://platform.openai.com/account/rate-limits to learn more.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/metagpt/startup.py", line 72, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/app/metagpt/startup.py", line 68, in main
asyncio.run(startup(idea, investment, n_round, code_review, run_tests, implement))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/app/metagpt/startup.py", line 47, in startup
await company.run(n_round=n_round)
File "/app/metagpt/metagpt/software_company.py", line 60, in run
await self.environment.run()
File "/app/metagpt/metagpt/environment.py", line 67, in run
await asyncio.gather(*futures)
File "/app/metagpt/metagpt/roles/role.py", line 240, in run
rsp = await self._react()
File "/app/metagpt/metagpt/roles/role.py", line 209, in _react
return await self._act()
File "/app/metagpt/metagpt/roles/engineer.py", line 211, in _act
return await self._act_sp_precision()
File "/app/metagpt/metagpt/roles/engineer.py", line 186, in _act_sp_precision
code = await WriteCode().run(context=context_str, filename=todo)
File "/app/metagpt/metagpt/actions/write_code.py", line 78, in run
code = await self.write_code(prompt)
File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/_asyncio.py", line 47, in __call__
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f6058875ea0 state=finished raised RateLimitError>]
@absane @gnovelli @stellaHSR @ebudmada @adubinsky
i'm the maintainer of LiteLLM we allow you to maximize your throughput/increase rate limits - load balance between multiple deployments (Azure, OpenAI) I believe litellm can be helpful here - and i'd love your feedback if we're missing something
Here's how to use it Docs: https://docs.litellm.ai/docs/routing
from litellm import Router
model_list = [{ # list of model deployments
"model_name": "gpt-3.5-turbo", # model alias
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-v-2", # actual model name
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ",
"api_key": os.getenv("OPENAI_API_KEY"),
}
}]
router = Router(model_list=model_list)
# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)
Since no further responses are needed, we will close it. Please reopen it if necessary.
2023-08-22 07:53:00.873 | INFO | metagpt.actions.write_code:run:77 - Writing models.py.. Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call result = await fn(*args, kwargs) File "/app/metagpt/metagpt/actions/write_code.py", line 71, in write_code code_rsp = await self._aask(prompt) File "/app/metagpt/metagpt/actions/action.py", line 47, in _aask return await self.llm.aask(prompt, system_msgs) File "/app/metagpt/metagpt/provider/base_gpt_api.py", line 44, in aask rsp = await self.acompletion_text(message, stream=True) File "/app/metagpt/metagpt/provider/openai_api.py", line 32, in wrapper return await f(*args, *kwargs) File "/app/metagpt/metagpt/provider/openai_api.py", line 218, in acompletion_text return await self._achat_completion_stream(messages) File "/app/metagpt/metagpt/provider/openai_api.py", line 151, in _achat_completion_stream response = await openai.ChatCompletion.acreate( File "/usr/local/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 45, in acreate return await super().acreate(args, kwargs) File "/usr/local/lib/python3.9/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 217, in acreate response, , api_key = await requestor.arequest( File "/usr/local/lib/python3.9/site-packages/openai/api_requestor.py", line 382, in arequest resp, got_stream = await self._interpret_async_response(result, stream) File "/usr/local/lib/python3.9/site-packages/openai/api_requestor.py", line 726, in _interpret_async_response self._interpret_response_line( File "/usr/local/lib/python3.9/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-tzt146yc7FwoDKlyH63f2AH7 on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/app/metagpt/startup.py", line 40, in
fire.Fire(main)
File "/usr/local/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
File "/app/metagpt/startup.py", line 36, in main
asyncio.run(startup(idea, investment, n_round, code_review, run_tests))
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/app/metagpt/startup.py", line 24, in startup
await company.run(n_round=n_round)
File "/app/metagpt/metagpt/software_company.py", line 60, in run
await self.environment.run()
File "/app/metagpt/metagpt/environment.py", line 56, in run
await asyncio.gather(futures)
File "/app/metagpt/metagpt/roles/role.py", line 240, in run
rsp = await self._react()
File "/app/metagpt/metagpt/roles/role.py", line 209, in _react
return await self._act()
File "/app/metagpt/metagpt/roles/engineer.py", line 207, in _act
return await self._act_sp()
File "/app/metagpt/metagpt/roles/engineer.py", line 133, in _act_sp
code = await WriteCode().run(
File "/app/metagpt/metagpt/actions/write_code.py", line 78, in run
code = await self.write_code(prompt)
File "/usr/local/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(args, kwargs)
File "/usr/local/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.9/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f0b935c3a30 state=finished raised RateLimitError>]