boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.44k stars 1.06k forks source link

EventStreamError is a 400 and never retries, but some EventStreamErrors are retriable #3126

Open SamStephens opened 4 months ago

SamStephens commented 4 months ago

Describe the bug

https://github.com/boto/botocore/blob/bf2473756ac0dac340916eaac606b9b767d15e99/botocore/eventstream.py#L354-L362

means that any EventStreamError is treated as a 400.

Expected Behavior

Some EventStreamErrors should be retried. For instance, throttlingException.

Current Behavior

No EventStreamErrors are retried.

Reproduction Steps

N/A

Possible Solution

What would be ideal is if every exception returnable in an event stream mapped directly back to an exception the client already supports. In that case, we could look up the exception from the client and get the correct status code. Or possibly even instantiate the exception as an instance of the exception from the client wrapped in an EventStreamError. For example, a throttling error from the bedrock runtime might look like:

botocore.exceptions.EventStreamError(base_exception=bedrock_client.exceptions.ThrottlingException())

Even if some clients have event streams with exceptions that are not in the list of exceptions the client supports, we could still use this idea with a fallback behaviour.

If this idea isn't suitable, a simpler but less flexible idea is to have a static list of exception names that should be treated as 500s.

Additional Information/Context

EDIT: https://github.com/boto/boto3/issues/4031 was resolved in favour of this issue, so documentation needs to be addressed as part of this.

The documentation for methods that return event streams, at least the ones I've looked at, include errors as part of the shape of the structure that can be returned. For example from https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint_with_response_stream.html

    'Body': EventStream({
        'PayloadPart': {
            'Bytes': b'bytes'
        },
        'ModelStreamError': {
            'Message': 'string',
            'ErrorCode': 'string'
        },
        'InternalStreamFailure': {
            'Message': 'string'
        }
    }),

However boto3 is smart enough to throw errors as a EventStreamError, via https://github.com/boto/botocore/blob/bf2473756ac0dac340916eaac606b9b767d15e99/botocore/eventstream.py#L354-L362 and https://github.com/boto/botocore/blob/bf2473756ac0dac340916eaac606b9b767d15e99/botocore/eventstream.py#L613-L619.

This is a couple of examples, not an exhaustive list:

SDK version used

Applies to latest

Environment details (OS name and version, etc.)

Ubuntu (Windows Subsystem for Linux)

tim-finnigan commented 1 month ago

Thanks for reporting this issue, we can continue tracking it for further review from the team. In the meantime here is some guidance to help with Bedrock throttling exceptions: https://repost.aws/questions/QU11DRlMZfRDy0ngHxpO1VCw/throttlingexceptions-while-using-on-demand-bedrock-runtime-for-invoking-claude-v2-1#ANkwWynhQgRHi003I_nC3NJQ

To expand on that post a bit more:

SamStephens commented 1 month ago

As https://github.com/boto/boto3/issues/4031 was resolved in favour of this issue, I've updated the issue description to ensure that documentation issue is addressed as part of resolving this issue.