aws-samples / bedrock-claude-chat

AWS-native chatbot using Bedrock + Claude (+Mistral)
MIT No Attribution
797 stars 292 forks source link

[BUG] Published API Token limit #439

Open DTheunis opened 1 month ago

DTheunis commented 1 month ago

Describe the bug

When doing a POST to a published API with a large message, we run into this error: [ERROR] ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: #/texts/0: expected maxLength: 2048, actual: 8575, please reformat your input and try again.

To Reproduce

API POST to a published API with a message larger than 2048 tokens.


Additional context

Is there any solution to this or workaround? For example I could split up the message in different chunks - however I would like to send it all as 1 message so that I get 1 response to the full message.

statefb commented 1 month ago

Related: #434

The SQS massage size limit is 256KiB. As you mentioned, we need to split into chunks or put on external storage temporarily.

DTheunis commented 1 month ago

No sorry, it's a different issue. The one you linked is indeed an SQS size issue - the issue I'm having is a character limit to invoke Bedrock.

I can send the exact message I want to post to an API in a chatbot through the application, while it will fail while using the API. Surely it should be possible to utilise the same logic for published API and API to post a chat message?

statefb commented 1 month ago

Sorry, I misunderstood. Yes surely they utilize the same logic but there is a implementation difference. Published api uses chat method and the other uses process_chat_input. If the same message invokes error is posted on the post a chat interface, it would make the same err.

DTheunis commented 1 month ago

It does not generate the same error when you post the same message to the API and the chat interface, see below. When posting to the API, the sqs will try to invokemodel but get the character limit error:

START RequestId: eed351e9-5e10-5791-a0c2-0ef9682d92ae Version: $LATEST
[INFO]  2024-07-16T08:49:18.020Z    eed351e9-5e10-5791-a0c2-0ef9682d92ae    Finding conversation: 01J2XBPNMK2MEXBJ723KRN3SFB
[INFO]  2024-07-16T08:49:18.674Z    eed351e9-5e10-5791-a0c2-0ef9682d92ae    No conversation found with id: 01J2XBPNMK2MEXBJ723KRN3SFB. Creating new conversation.
[INFO]  2024-07-16T08:49:18.674Z    eed351e9-5e10-5791-a0c2-0ef9682d92ae    Bot id is provided. Fetching bot.
LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see
[ERROR] ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: #/texts/0: expected maxLength: 2048, actual: 4429, please reformat your input and try again.Traceback (most recent call last):  
File "/var/task/app/", line 27, in handler    
chat_result = chat(user_id=user_id, chat_input=chat_input)  
File "/var/task/app/usecases/", line 325, in chat    
search_results = search_related_docs(  
File "/var/task/app/", line 74, in search_related_docs    
query_embedding = calculate_query_embedding(query)  
File "/var/task/app/", line 206, in calculate_query_embedding    
response = client.invoke_model(  
File "/var/lang/lib/python3.11/site-packages/botocore/", line 565, in _api_call    
return self._make_api_call(operation_name, kwargs)  
File "/var/lang/lib/python3.11/site-packages/botocore/", line 1021, in _make_api_call    
raise error_class(parsed_response, operation_name)
END RequestId: eed351e9-5e10-5791-a0c2-0ef9682d92ae
REPORT RequestId: eed351e9-5e10-5791-a0c2-0ef9682d92ae  
Duration: 3152.59 ms    Billed Duration: 3153 ms    Memory Size: 1024 MB    Max Memory Used: 200 MB

When posting to the Chat interface, it works, see my screenshot. claude_chat

I've included the text that I used to test this as a file. (I let Claude generate a long text) dummy.txt

This is the code I use to post to the API:

import requests
import json
from typing import List

def send_to_api(chunk: str, conversation_id: str = None):
    base_url = """
    api_key = "XXXX"
    headers = {
        "x-api-key": api_key,
        "Content-Type": "application/json"

    payload = {
        "message": {
            "content": [
                    "contentType": "text",
                    "body": chunk
            "model": "claude-v3.5-sonnet"
        "continue_generate": False

    if conversation_id:
        payload["conversation_id"] = conversation_id

    response ="{base_url}/conversation", headers=headers, json=payload)

    if response.status_code == 200:
        return response.json()
        raise Exception(f"API request failed with status code {response.status_code}: {response.text}")

# Read the transcription from the file
with open('transcription.txt', 'r', encoding='utf-8') as file:
    transcription =

# Split the transcription into chunks

conversation_id = None

result = send_to_api(transcription, conversation_id)
conversation_id = result.get("conversationId")
print("API Response:")
print(json.dumps(result, indent=2))

Please do keep in mind that if you were to test this yourself, the API call will return a success, but if you go to your Cloudwatch SQS logs, you will see that there was an error.

It might also be in the way I post to the API? Would love to hear a way to make this work since we would like to work with published API's and longer texts.

statefb commented 1 month ago

Thank you for the detail. The log shows that the bot looks like using Knowledge a.k.a. RAG but your chat screen does not look like using Knowledge. Could you describe the procedure to reproduce concretely and precisely as much as possible?

DTheunis commented 1 month ago

I did paste my code and procedure in my previous post, which is how I call the API from a published bot API. Run that code with a file 'transcription.txt' (3000+ characters content) and you will get this error.

Also for clarification, both ways I showed are using a bot with no knowledge - only a custom instruction.

(Editting in your own published bot API and API Key)

So again:

2048+ characters text message -> Chat interface -> Bot with no knowledge (Custom instructions only) -> Response 2048+ characters text message -> Published API Post -> Bot with no knowledge (Custom instructions only) -> Error

DTheunis commented 1 month ago

Actually got an update:

Using a bot with knowledge generates the same error as calling a POST to the API of a bot without knowledge.

Operation failed: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: #/texts/0: expected maxLength: 2048, actual: 294210, please reformat your input and try again.

So when posting a long text to a bot without knowledge, it works, but posting it to a bot with knowledge - it does not work.

Is the cause of this known? Any workaround present/is it on the roadmap?