OpenInterpreter / open-interpreter

A natural language interface for computers
http://openinterpreter.com/
GNU Affero General Public License v3.0
52.41k stars 4.62k forks source link

Rate limit error occured #524

Closed hoshins closed 11 months ago

hoshins commented 1 year ago

Describe the bug

Errors occur as the conversation gets longer.

lib/python3.11/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line raise self.handle_error_response( openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-fe2ux6iXsX3l9HT45ONzU6wi on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.

Reproduce

  1. run interpreter -y
  2. Ask to analyze .csv data file in local
  3. LLM tried to check data file with function calling and conversation gets longer while checking the file.
  4. error occured.

Expected behavior

no error

Screenshots

No response

Open Interpreter version

0.1.15

Python version

3.11.5

Operating System name and version

Ubuntu 20.04

Additional context

No response

ericrallen commented 1 year ago

That error is from the OpenAI API, and it's saying that the rate limit of 10,000 tokens / minute was exceeded.

How large was the CSV file? This could happen with a large enough file being loaded into context.

You might want to try asking it to write code that analyzes it or write some code that takes a sample of the interesting/relevant data (based on whatever parameters you're trying to analyze) from the file and sends that to the model for analysis instead.

firofame commented 1 year ago

i get the RateLimitError error runninginterpreter.chat("Please print hello world.")

ericrallen commented 1 year ago

Have you tried switching models? Does the same error occur?

You might also want to check your OpenAI Dashboard and make sure you haven’t hit any threshold you defined there.

You might also want to reach out to OpenAI support to see if your account got flagged for some more aggressive rate limit.

You can check your account’s rate limits and read more about OpenAI rate limits.

adubinsky commented 11 months ago

Here's the issue in 1.7. Nothing's amiss in OpenAI. My limit is 10,000 requests per minute which should be sufficient for personal use of a terminal app. Clearly there's some runaway process.

Traceback (most recent call last): File "/usr/local/bin/interpreter", line 8, in sys.exit(cli()) ^^^^^ File "/usr/local/lib/python3.11/site-packages/interpreter/core/core.py", line 21, in cli cli(self) File "/usr/local/lib/python3.11/site-packages/interpreter/cli/cli.py", line 168, in cli interpreter.chat() File "/usr/local/lib/python3.11/site-packages/interpreter/core/core.py", line 66, in chat for _ in self._streaming_chat(message=message, display=display): File "/usr/local/lib/python3.11/site-packages/interpreter/core/core.py", line 87, in _streaming_chat yield from terminal_interface(self, message) File "/usr/local/lib/python3.11/site-packages/interpreter/terminal_interface/terminal_interface.py", line 62, in terminal_interface for chunk in interpreter.chat(message, display=False, stream=True): File "/usr/local/lib/python3.11/site-packages/interpreter/core/core.py", line 95, in _streaming_chat yield from self._respond() File "/usr/local/lib/python3.11/site-packages/interpreter/core/core.py", line 121, in _respond yield from respond(self) File "/usr/local/lib/python3.11/site-packages/interpreter/core/respond.py", line 57, in respond for chunk in interpreter._llm(messages_for_llm): File "/usr/local/lib/python3.11/site-packages/interpreter/llm/setup_openai_coding_llm.py", line 94, in coding_llm response = litellm.completion(params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 662, in wrapper raise e File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 621, in wrapper result = original_function(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/litellm/timeout.py", line 44, in wrapper result = future.result(timeout=local_timeout_duration) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/usr/local/lib/python3.11/site-packages/litellm/timeout.py", line 33, in async_func return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 1112, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 2661, in exception_type raise e File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 2098, in exception_type raise original_exception File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 392, in completion raise e File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 374, in completion response = openai.ChatCompletion.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 155, in create response, , api_key = requestor.request( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 299, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 710, in _interpret_response self._interpret_response_line( File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 775, in _interpret_response_line raise self.handle_error_response( openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-cX7SINyZ16C9mquk3hBk66ME on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.

ericrallen commented 11 months ago

@adubinsky What was the request that triggered that error?

Also, can you confirm that you can hit the API manually via curl with the same OPENAI_API_KEY environment variable and model (gpt-4/gpt-3.5-turbo)?

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
adubinsky commented 11 months ago

I was asking it to review code locally.

The issue that presents to the user is that it repeats the same output over and over. Sometimes there's marginal improvement over the loops. Most of the time it crashes with the error I provided.

A small percent of the time it will solve the problem after grinding. It's costly though. I spent $12.50 yesterday just playing with it on sample code like this. Nothing serious.

Here's a link to the kind of loop it gets stuck in:

https://vimeo.com/872533364?share=copy

API works fine:

$ curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' { "id": "chatcmpl-87jFrGJEbMxT7U4zucFpQGuSK09D2", "object": "chat.completion", "created": 1696852947, "model": "gpt-3.5-turbo-0613", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I assist you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 19, "completion_tokens": 9, "total_tokens": 28 } }

ericrallen commented 11 months ago

I believe the “loop” it’s “getting stuck in“ from your video is just how the output renders to the Terminal interface during steaming if you scroll back into the Terminal’s history.

It doesn’t overwrite the existing output, but writes new output with the new content for each chunk appended to it.

Mine looks the same when I scroll back, but I average about $2.00 USD per day of usage - and most of that comes from testing changes to the underlying system.

If you were to save the current messages to disk via the %save_message magic command, I’m pretty sure you’d only see the expected number of messages have been sent and received.

However, if you’re having it load files into its context, and then generate new code for those files, that can quickly start to add up to a lot of tokens.

It’s helpful to remember that every previous message and it’s reply has to be sent to the model each time you make a request in order to provide it with the conversation’s history and the context around what you’re talking about in a new message.

So, if you load in a file of code that is 1,000 tokens, you are sending that 1,000 tokens in addition to the initial system prompt (~600 tokens) and every previous message and reply with every request.

ericrallen commented 11 months ago

You might want to consider testing out the new %tokens magic command to get a sense of how many tokens are being used.

ericrallen commented 11 months ago

Going to close this one as stale.

Feel encouraged to reopen it if there’s still an Issue.

moymoussan commented 11 months ago

It would be incredible to have a magical command to ask the LLM to summarize the conversation and then start a new one (%reset) with the summary along with the system prompt as context. This way, all conversation history tokens will be cleaned and a “new” conversation can be started with a sufficient context but significantly less tokens

moymoussan commented 11 months ago

You might want to consider testing out the new %tokens magic command to get a sense of how many tokens are being used.

I thinks there’s an error on that calculation, when I see my conversations history (the json file) I can count the words, chars, etc and have an idea of the amount of tokens the conversation uses (and it makes sense with the %tokens output). But actually, that’s not the correct way to calculate the total amount of tokens used on conversation because on every new message the API receives also all previous messages as context.

So the %tokens command is not telling you how many tokens you have used on that conversation, is telling you the amount of tokens that you’ll use on your next message!

So every new message, you’ll spend the amount of %tokens + the new message input + completion tokens.

Am I right?

ericrallen commented 11 months ago

I guess I could have described it better. My initial thought was to show how many tokens will be sent with the next request.

That is probably confusing for folks.

I’ll submit a PR that clears up the language and start working on an aggregate calculation, too.

Thanks for the feedback!