boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.97k stars 1.86k forks source link

Claude 3.5 Sonnet is limited to 4096 tokens - should be 8192 #4279

Open cfernhout opened 2 days ago

cfernhout commented 2 days ago

Describe the bug

When invoking converse with maxTokens > 4096, it tells me that I cant. But Clause Sonnet 3.5 has a token limit of 8192. I've tested to use the anthropic SDK instead of boto3 and that works.

Expected Behavior

Should work with maxTokens up to 8192.

Current Behavior

client = boto3.client(
    'bedrock-runtime',
    region_name=AWS_REGION,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)

response = client.converse(
    modelId=f"arn:aws:bedrock:{AWS_REGION}:***:inference-profile/eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": [{"text": "Hello, world"}]}],
    inferenceConfig={
        'maxTokens': 8192,
    },
)
---------------------------------------------------------------------------
ValidationException                       Traceback (most recent call last)
Cell In[19], line 8
      1 client = boto3.client(
      2     'bedrock-runtime',
      3     region_name=AWS_REGION,
      4     aws_access_key_id=AWS_ACCESS_KEY_ID,
      5     aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
      6 )
----> 8 response = client.converse(
      9     modelId="arn:aws:bedrock:eu-west-1:760878568205:inference-profile/eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
     10     messages=[{"role": "user", "content": [{"text": "Hello, world"}]}],
     11     inferenceConfig={
     12         'maxTokens': 8192,
     13     },
     14 )

File /usr/local/lib/python3.12/site-packages/botocore/client.py:569, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    565     raise TypeError(
    566         f"{py_operation_name}() only accepts keyword arguments."
    567     )
    568 # The "self" in this scope is referring to the BaseClient.
--> 569 return self._make_api_call(operation_name, kwargs)

File /usr/local/lib/python3.12/site-packages/botocore/client.py:1023, in BaseClient._make_api_call(self, operation_name, api_params)
   1019     error_code = error_info.get("QueryErrorCode") or error_info.get(
   1020         "Code"
   1021     )
   1022     error_class = self.exceptions.from_code(error_code)
-> 1023     raise error_class(parsed_response, operation_name)
   1024 else:
   1025     return parsed_response

ValidationException: An error occurred (ValidationException) when calling the Converse operation: The maximum tokens you requested exceeds the model limit of 4096. Try again with a maximum tokens value that is lower than 4096.

Reproduction Steps

import boto3
from anthropic import AnthropicBedrock
from *** import AWS_ACCESS_KEY_ID, AWS_REGION, AWS_SECRET_ACCESS_KEY

client = AnthropicBedrock(
    aws_access_key=AWS_ACCESS_KEY_ID,
    aws_secret_key=AWS_SECRET_ACCESS_KEY,
    aws_region=AWS_REGION,
)

message = client.messages.create(
    model="eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
    max_tokens=8192,
    messages=[{"role": "user", "content": "Hello, world"}]
)
# This works perfectly fine

client = boto3.client(
    'bedrock-runtime',
    region_name=AWS_REGION,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)

response = client.converse(
    modelId=f"arn:aws:bedrock:{AWS_REGION}:***:inference-profile/eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": [{"text": "Hello, world"}]}],
    inferenceConfig={
        'maxTokens': 8192,
    },
)
# ValidationException: An error occurred (ValidationException) when calling the Converse operation: The maximum tokens you requested exceeds the model limit of 4096. Try again with a maximum tokens value that is lower than 4096.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.35.2

Environment details (OS name and version, etc.)

Linux and MacOs.

tim-finnigan commented 2 days ago

Thanks for reporting — I can reproduce the ValidationError. This is coming from the Converse API, so needs to be supported upstream by the Bedrock service.

Upon searching internally, I found that the Bedrock team is aware of this and has plans to increase the maxTokens validation number accordingly. You can follow the blog and CHANGELOG for updates.

(For now if you remove the inferenceConfig from that snippet or specify <= 4096 then it should work. I tested with modelId="anthropic.claude-3-5-sonnet-20240620-v1:0" and the request succeeded after doing that.)