Invoke Bedrock Agent with boto3: Unable to generate multiple chunks for streaming response

devd7c commented 2 weeks ago

Describe the bug

The boto3 library is not able to generate multiple chunks for performing a streaming response when using the invoke_agent method. The response intermittently includes None chunks and raises a JSONDecodeError. Previous issue reference Boto3 documentation for invoke agents Amazon Bedrock using streaming message samples

Expected Behavior

The invoke_agent method should return a streaming response in multiple chunks without intermittent None chunks. Each chunk should contain valid JSON that can be decoded and processed as JSON.

Inside the Amazon Bedrock Playground the functionality it seems soo easy to change for stream response or not. And when is need to implement and invoke an agent with these functionality is not very easy to set. image_2024-06-07_174905335

Current Behavior

The response includes intermittent None chunks. A JSONDecodeError is raised when attempting to decode None values

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive', 'content-type': 'application/json', 'date': 'Fri, 07 Jun 2024 20:59:27 GMT', 'transfer-encoding': 'chunked', 'x-amz-bedrock-agent-session-id': 'd2c6c96f-2510-11ef-a351-4c24ce79c99c', 'x-amzn-bedrock-agent-content-type': 'application/json', 'x-amzn-requestid': 'daf30b3d-86f8-4bc7-a8e8-84d6e9d9e50b'}, 'HTTPStatusCode': 200, 'RequestId': 'daf30b3d-86f8-4bc7-a8e8-84d6e9d9e50b', 'RetryAttempts': 0}, 'completion': <botocore.eventstream.EventStream object at 0x0000012B0F01A930>, 'contentType': 'application/json', 'sessionId': 'd2c6c96f-2510-11ef-a351-4c24ce79c99c'} chunk: None chunk: None chunk: None chunk: None chunk: None chunk: {'bytes': b"Hello!, I'm AI assistant. How can I assist you today?"} bytes: Hello! I'm AI assistant. How can I assist you today? Traceback (most recent call last): File "C:\Users\marce\AppData\Local\Temp\ipykernel_86248\3204360913.py", line 29, in call_haiku message = json.loads(decoded_bytes) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\marce\AppData\Local\Programs\Python\Python312\Lib\json__init__.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\marce\AppData\Local\Programs\Python\Python312\Lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\marce\AppData\Local\Programs\Python\Python312\Lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Reproduction Steps

Steps to reproduce the behavior:

Set up a boto3 session with a specific profile using the following documentation for boto3
Invoke the agent using bedrock_agent_runtime_client.invoke_agent.
Attempt to process the streaming response in chunks.

Code Snippet


import boto3
import uuid
import json
import pprint
import traceback
import time
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

session = boto3.Session(profile_name=name_of_profile)
bedrock_agent_runtime_client = session.client('bedrock-agent-runtime')

session_id = str(uuid.uuid1())
enable_trace = True
end_session = False

def call_haiku():
    while True:
        try:
            agent_response = bedrock_agent_runtime_client.invoke_agent(
                inputText="Hello!",
                agentId="####",
                agentAliasId="####",
                sessionId=session_id,
                enableTrace=enable_trace,
                endSession=end_session
            )
            logger.info("Agent raw response:")
            pprint.pprint(agent_response)
            if 'completion' not in agent_response:
                raise ValueError("Missing 'completion' in agent response")
            for event in agent_response['completion']:
                chunk = event.get('chunk')
                print('chunk: ', chunk)
                if chunk:
                    decoded_bytes = chunk.get("bytes").decode()
                    print('bytes: ', decoded_bytes)
                    if decoded_bytes.strip():
                        message = json.loads(decoded_bytes)
                        if message['type'] == "content_block_delta":
                            yield message['delta']['text'] or ""
                        elif message['type'] == "message_stop":
                            return "\n"
        except Exception as e:
            print(traceback.format_exc())
            return f"Error: {str(e)}"

response = call_haiku()
print(response)
if response:
    for text in response:
        print(text, end='', flush=True)
        time.sleep(0.1)
else:
    print("No response")

### Possible Solution

_No response_

### Additional Information/Context

Name: boto3
Version: 1.34.122
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: C:\Users\marce\AppData\Local\Programs\Python\Python312\Lib\site-packages
Requires: botocore, jmespath, s3transfer
Required-by: anthropic-bedrock, bedrock-anthropic, langchain-aws

### SDK version used

1.34.122

### Environment details (OS name and version, etc.)

Windows 11 x64

tim-finnigan commented 2 weeks ago

Thanks for reaching out. As mentioned in the previous issue you referenced, I confirmed with the Bedrock team that this is currently the expected behavior. The InvokeAgent API only returns one chunk. I suggested that they update their documentation to better clarify that behavior. They are also tracking a feature request in their backlog to stream multiple chunks. Please refer to the news blog and CHANGELOG for future updates related to that.

github-actions[bot] commented 2 weeks ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

boto / boto3