Closed katmai closed 1 year ago
I also encountered the same problem and couldn't continue the project
It's coming from updating memory summary. That appears to be a global behaviour. Your are constrained by 4096 tokens context window given the model you are using - likely gpt 3.5 - if you used gpt-4, you would not error out here. I can think of adding chunking for a certain class of commands?
It's coming from updating memory summary. That appears to be a global behaviour. Your are constrained by 4096 tokens context window given the model you are using - likely gpt 3.5 - if you used gpt-4, you would not error out here. I can think of adding chunking for a certain class of commands?
i've already set the token limit to 4000 since i am on GPT3.5, but it's not working, so idk.
### LLM MODEL SETTINGS
## FAST_TOKEN_LIMIT - Fast token limit for OpenAI (Default: 4000)
## SMART_TOKEN_LIMIT - Smart token limit for OpenAI (Default: 8000)
## When using --gpt3only this needs to be set to 4000.
# FAST_TOKEN_LIMIT=4000
SMART_TOKEN_LIMIT=4000
ghly targeted prospect lists. Bulks. Search or verify contact lists in minutes with bulk tasks. Enrichment." } ]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/app/autogpt/__main__.py", line 5, in <module>
autogpt.cli.main()
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1635, in invoke
rv = super().invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/autogpt/cli.py", line 90, in main
run_auto_gpt(
File "/app/autogpt/main.py", line 171, in run_auto_gpt
agent.start_interaction_loop()
File "/app/autogpt/agent/agent.py", line 112, in start_interaction_loop
assistant_reply = chat_with_ai(
File "/app/autogpt/llm/chat.py", line 165, in chat_with_ai
agent.summary_memory = update_running_summary(
File "/app/autogpt/memory_management/summary_memory.py", line 123, in update_running_summary
current_memory = create_chat_completion(messages, cfg.fast_llm_model)
File "/app/autogpt/llm/llm_utils.py", line 166, in create_chat_completion
response = api_manager.create_chat_completion(
File "/app/autogpt/llm/api_manager.py", line 55, in create_chat_completion
response = openai.ChatCompletion.create(
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 682, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 7009 tokens. Please reduce the length of the messages.
my@my-Mac-mini auto-gpt %
It's coming from updating memory summary. That appears to be a global behaviour. Your are constrained by 4096 tokens context window given the model you are using - likely gpt 3.5 - if you used gpt-4, you would not error out here. I can think of adding chunking for a certain class of commands?
i've already set the token limit to 4000 since i am on GPT3.5, but it's not working, so idk.
### LLM MODEL SETTINGS ## FAST_TOKEN_LIMIT - Fast token limit for OpenAI (Default: 4000) ## SMART_TOKEN_LIMIT - Smart token limit for OpenAI (Default: 8000) ## When using --gpt3only this needs to be set to 4000. # FAST_TOKEN_LIMIT=4000 SMART_TOKEN_LIMIT=4000
### LLM MODEL SETTINGS
## FAST_TOKEN_LIMIT - Fast token limit for OpenAI (Default: 4000)
## SMART_TOKEN_LIMIT - Smart token limit for OpenAI (Default: 8000)
## When using --gpt3only this needs to be set to 4000.
FAST_TOKEN_LIMIT=3000
SMART_TOKEN_LIMIT=3000
### EMBEDDINGS
## EMBEDDING_MODEL - Model to use for creating embeddings
## EMBEDDING_TOKENIZER - Tokenizer to use for chunking large inputs
## EMBEDDING_TOKEN_LIMIT - Chunk size limit for large inputs
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_TOKENIZER=cl100k_base
EMBEDDING_TOKEN_LIMIT=8191
same, not sure if I was running GPT3 only tough
I am experiencing the same behavior since i updated to version 3.0
I got this error also in the latest stable branch v0.3.0
Same here on the latest version can't move forward building
Same here
Same question
I am new to this i think i have the exact same issue as its the last request i will post all of it here just in case im missing something. Thanks everyone
File "c:\Autogpt\Auto-GPT\autogpt__main.py", line 5, in
Same problem with any branch ( Master or Stable 0.3.0/0.2.2 )
I cant move project with this... Same problem Thanks"
I am currently working on a possible fix for this, as in theory I think this is caused by total tokens in request for gpt3 model. There is a 'send_token_limit' variable that is currently subtracting 1000 to retain for the response request. I am testing out 1500 for this to see if it still errors. I am shooting in the dark here, but I will let you all know if this resolves the issue or not.
Hi Guys, have the same issue. the number of tokens can be significantly higher. workin for hours on a solution....unfortunately without success so far.
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 25424 tokens. Please reduce the length of the messages.
Same problem here since I upgrade to 0.3.0... why is the agent sending messages longer than 4000?
It's a hard limit imposed by openai
Same issue here
Same issue when i update to 0.3.0
+1
i have same problem, when i use langchain DB_chain to query mysql database.
This model's maximum context length is 4097 tokens, however you requested 4582 tokens (4326 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
+1
+1
The same problem, but I have a slightly different error message, as: openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 10549 tokens (10549 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
I tried to catch the exception. Works on some occasions but it seems to cause other issues or the program terminates since the response is none. In general, this issue is one of the biggest issues in autogpt currently. You basically can't use it since it breaks down every 2 seconds depending on the task you have given it.
+1
HI All, so I searched a bit and basically what has to be done is this, but of course adapted to autoGPT: (see also here https://blog.devgenius.io/how-to-get-around-openai-gpt-3-token-limits-b11583691b32)
def break_up_file(tokens, chunk_size, overlap_size): if len(tokens) <= chunk_size: yield tokens else: chunk = tokens[:chunk_size] yield chunk yield from break_up_file(tokens[chunk_size-overlap_size:], chunk_size, overlap_size)
def break_up_file_to_chunks(filename, chunk_size=2000, overlap_size=100): with open(filename, 'r') as f: text = f.read() tokens = word_tokenize(text) return list(break_up_file(tokens, chunk_size, overlap_size))
def convert_to_detokenized_text(tokenized_text): prompt_text = " ".join(tokenized_text) prompt_text = prompt_text.replace(" 's", "'s") return detokenized_text
filename = "/content/drive/MyDrive/Colab Notebooks/minutes/data/Round_22_Online_Kickoff_Meeting.txt"
prompt_response = [] chunks = break_up_file_to_chunks(filename)
for i, chunk in enumerate(chunks): prompt_request = "Summarize this meeting transcript: " + convert_to_detokenized_text(chunks[i]) response = openai.Completion.create( model="text-davinci-003", prompt=prompt_request, temperature=.5, max_tokens=500, top_p=1, frequency_penalty=0, presence_penalty=0 )
prompt_response.append(response["choices"][0]["text"].strip())
found some interesting solutions for the issue...if you look outside autogpt the issue is also well known.
1) https://medium.com/@shweta-lodha/how-to-deal-with-openai-token-limit-issue-part-1-d0157c9e4d4e 2) https://www.youtube.com/watch?v=_vetq4G0Gsc 3) https://www.youtube.com/watch?v=Oj1GUJnJrWs 4) https://www.youtube.com/watch?v=xkCzP4-YoNA
+1
+1
+1
+1
+1
+1
+1
Should this part of the text.py prevent this? ` if expected_token_usage <= max_length: current_chunk.append(sentence) else: yield " ".join(current_chunk) current_chunk = [sentence] message_this_sentence_only = [ create_message(" ".join(current_chunk), question) ] expected_token_usage = ( count_message_tokens(messages=message_this_sentence_only, model=model)
I have been consistently getting this and the JSON error.
I thought changing (i.e. de-commenting) the below in the .env file appears to have resolved the token length issue. UPDATED: it did not resolve the error.
running on Docker, gpt3only
EMBEDDING_MODEL=text-embedding-ada-002 EMBEDDING_TOKENIZER=cl100k_base EMBEDDING_TOKEN_LIMIT=8191
I have been consistently getting this and the JSON error.
I thought changing (i.e. de-commenting) the below in the .env file appears to have resolved the token length issue. UPDATED: it did not resolve the error.
running on Docker, gpt3only
EMBEDDING_MODEL=text-embedding-ada-002 EMBEDDING_TOKENIZER=cl100k_base EMBEDDING_TOKEN_LIMIT=8191
That doesn't work. You will run into issues eventually
Playing around with some experimental code that was commented out in chat.py will also try setting subtraction amount to 2000 but that's not ideal. my chat.py code below
import time from random import shuffle
from openai.error import RateLimitError
from autogpt.config import Config from autogpt.llm.api_manager import ApiManager from autogpt.llm.base import Message from autogpt.llm.llm_utils import create_chat_completion from autogpt.llm.token_counter import count_message_tokens from autogpt.logs import logger from autogpt.memory_management.store_memory import ( save_memory_trimmed_from_context_window, ) from autogpt.memory_management.summary_memory import ( get_newly_trimmed_messages, update_running_summary, )
cfg = Config()
def create_chat_message(role, content) -> Message: """ Create a chat message with the given role and content.
Args:
role (str): The role of the message sender, e.g., "system", "user", or "assistant".
content (str): The content of the message.
Returns:
dict: A dictionary containing the role and content of the message.
"""
return {"role": role, "content": content}
def generate_context(prompt, relevant_memory, full_message_history, model): current_context = [ create_chat_message("system", prompt), create_chat_message( "system", f"The current time and date is {time.strftime('%c')}" ), create_chat_message( "system", f"This reminds you of these events from your past:\n{relevant_memory}\n\n", ), ]
# Add messages from the full message history until we reach the token limit
next_message_to_add_index = len(full_message_history) - 1
insertion_index = len(current_context)
# Count the currently used tokens
current_tokens_used = count_message_tokens(current_context, model)
return (
next_message_to_add_index,
current_tokens_used,
insertion_index,
current_context,
)
def chat_with_ai( agent, prompt, user_input, full_message_history, permanent_memory, token_limit ): """Interact with the OpenAI API, sending the prompt, user input, message history, and permanent memory.""" while True: try: """ Interact with the OpenAI API, sending the prompt, user input, message history, and permanent memory.
Args:
prompt (str): The prompt explaining the rules to the AI.
user_input (str): The input from the user.
full_message_history (list): The list of all messages sent between the
user and the AI.
permanent_memory (Obj): The memory object containing the permanent
memory.
token_limit (int): The maximum number of tokens allowed in the API call.
Returns:
str: The AI's response.
"""
model = cfg.fast_llm_model # TODO: Change model from hardcode to argument
# Reserve 1000 tokens for the response
logger.debug(f"Token limit: {token_limit}")
send_token_limit = token_limit - 1000
if len(full_message_history) == 0:
relevant_memory = ""
else:
recent_history = full_message_history[-5:]
shuffle(recent_history)
relevant_memories = permanent_memory.get_relevant(
str(recent_history), 5
)
if relevant_memories:
shuffle(relevant_memories)
relevant_memory = str(relevant_memories)
relevant_memory = ""
logger.debug(f"Memory Stats: {permanent_memory.get_stats()}")
(
next_message_to_add_index,
current_tokens_used,
insertion_index,
current_context,
) = generate_context(prompt, relevant_memory, full_message_history, model)
while current_tokens_used > 2500:
# remove memories until we are under 2500 tokens
relevant_memory = relevant_memory[:-1]
(
next_message_to_add_index,
current_tokens_used,
insertion_index,
current_context,
) = generate_context(
prompt, relevant_memory, full_message_history, model
)
current_tokens_used += count_message_tokens(
[create_chat_message("user", user_input)], model
) # Account for user input (appended later)
current_tokens_used += 500 # Account for memory (appended later) TODO: The final memory may be less than 500 tokens
# Add Messages until the token limit is reached or there are no more messages to add.
while next_message_to_add_index >= 0:
# print (f"CURRENT TOKENS USED: {current_tokens_used}")
message_to_add = full_message_history[next_message_to_add_index]
tokens_to_add = count_message_tokens([message_to_add], model)
if current_tokens_used + tokens_to_add > send_token_limit:
save_memory_trimmed_from_context_window(
full_message_history,
next_message_to_add_index,
permanent_memory,
)
break
# Add the most recent message to the start of the current context,
# after the two system prompts.
current_context.insert(
insertion_index, full_message_history[next_message_to_add_index]
)
# Count the currently used tokens
current_tokens_used += tokens_to_add
# Move to the next most recent message in the full message history
next_message_to_add_index -= 1
# Insert Memories
if len(full_message_history) > 0:
(
newly_trimmed_messages,
agent.last_memory_index,
) = get_newly_trimmed_messages(
full_message_history=full_message_history,
current_context=current_context,
last_memory_index=agent.last_memory_index,
)
agent.summary_memory = update_running_summary(
current_memory=agent.summary_memory,
new_events=newly_trimmed_messages,
)
current_context.insert(insertion_index, agent.summary_memory)
api_manager = ApiManager()
# inform the AI about its remaining budget (if it has one)
if api_manager.get_total_budget() > 0.0:
remaining_budget = (
api_manager.get_total_budget() - api_manager.get_total_cost()
)
if remaining_budget < 0:
remaining_budget = 0
system_message = (
f"Your remaining API budget is ${remaining_budget:.3f}"
+ (
" BUDGET EXCEEDED! SHUT DOWN!\n\n"
if remaining_budget == 0
else " Budget very nearly exceeded! Shut down gracefully!\n\n"
if remaining_budget < 0.005
else " Budget nearly exceeded. Finish up.\n\n"
if remaining_budget < 0.01
else "\n\n"
)
)
logger.debug(system_message)
current_context.append(create_chat_message("system", system_message))
# Append user input, the length of this is accounted for above
current_context.extend([create_chat_message("user", user_input)])
plugin_count = len(cfg.plugins)
for i, plugin in enumerate(cfg.plugins):
if not plugin.can_handle_on_planning():
continue
plugin_response = plugin.on_planning(
agent.prompt_generator, current_context
)
if not plugin_response or plugin_response == "":
continue
tokens_to_add = count_message_tokens(
[create_chat_message("system", plugin_response)], model
)
if current_tokens_used + tokens_to_add > send_token_limit:
logger.debug("Plugin response too long, skipping:", plugin_response)
logger.debug("Plugins remaining at stop:", plugin_count - i)
break
current_context.append(create_chat_message("system", plugin_response))
# Calculate remaining tokens
tokens_remaining = token_limit - current_tokens_used
assert tokens_remaining >= 0, "Tokens remaining is negative"
# This should never happen, please submit a bug report at
# https://www.github.com/Torantulino/Auto-GPT"
# Debug print the current context
logger.debug(f"Token limit: {token_limit}")
logger.debug(f"Send Token Count: {current_tokens_used}")
logger.debug(f"Tokens remaining for response: {tokens_remaining}")
logger.debug("------------ CONTEXT SENT TO AI ---------------")
for message in current_context:
# Skip printing the prompt
if message["role"] == "system" and message["content"] == prompt:
continue
logger.debug(f"{message['role'].capitalize()}: {message['content']}")
logger.debug("")
logger.debug("----------- END OF CONTEXT ----------------")
# TODO: use a model defined elsewhere, so that model can contain
# temperature and other settings we care about
assistant_reply = create_chat_completion(
model=model,
messages=current_context,
max_tokens=tokens_remaining,
)
# Update full message history
full_message_history.append(create_chat_message("user", user_input))
full_message_history.append(
create_chat_message("assistant", assistant_reply)
)
return assistant_reply
except RateLimitError:
# TODO: When we switch to langchain, this is built in
logger.warn("Error: ", "API Rate Limit Reached. Waiting 10 seconds...")
time.sleep(10)
Still having this problem in 0.3.1
This problem crashed the entire flow, maybe we just prevent crashing it and keep it continuing?
same problem with long html
+1 same error, nothing has worked as a workaround.
+1 Same error
+1 same error
+1
UPDATE: My experiment ultimately did not work as expected, and the dev team should consider using chunks.
I'm running locally with automatic coding disabled (not in Docker).
Here's my commit reference: commit 3d494f1032f77884f348ba0e89cfe0fd5022f9f4 (HEAD -> stable, tag: v0.3.1, origin/stable)
In my case, the error is caused by the function create_chat_completion
on line 55 of Auto-GPT\autogpt\llm\api_manager.py
. I believe the message list exceeds Open API's expected input. I added some hard-coded message limits to see if it would fix the issue. I will let you know if this works or not.
UPDATE: Currently testing the changes.
I'm running locally with automatic coding disabled (not in Docker).
Here's my commit reference: commit 3d494f1 (HEAD -> stable, tag: v0.3.1, origin/stable)
In my case, the error is caused by the function
create_chat_completion
on line 55 ofAuto-GPT\autogpt\llm\api_manager.py
. I believe the message list exceeds Open API's expected input. I added some hard-coded message limits to see if it would fix the issue. I will let you know if this works or not.Here's what I'm experimenting with:
api_manger.py llm_utils.py
thank you for working on this, let us know if your solution works out.
HI
I am brand new to autogpt and only set it up yesterday.
I have this issue! Does anyone yet have a fix?
Same here for exceeding 4097 tokens. None of my agents will finish a task. They all blow up with this error at some point and then I see what I can salvage from the files created.
β οΈ Search for existing issues first β οΈ
Which Operating System are you using?
Docker
Which version of Auto-GPT are you using?
Master (branch)
GPT-3 or GPT-4?
GPT-3.5
Steps to reproduce πΉ
listing the auto_gpt_workspace folder errors out. maybe this is an erroneous bug, not really sure, but why is it calling openai when it's merely listing the files in the folder?
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4819 tokens. Please reduce the length of the messages.
Current behavior π―
listing the folder contents errors out and kills the program if there's too many files in there.
Expected behavior π€
not ... error out :D
Your prompt π
Your Logs π