anthropics / anthropic-sdk-python

MIT License
1.52k stars 182 forks source link

Using the application inference profile in Bedrock results in failed model invocations. #740

Open moritalous opened 3 weeks ago

moritalous commented 3 weeks ago

Amazon Bedrock has added a new feature called "application inference profiles".

Using application inference profiles is like adding an alias to a base model.

import boto3

bedrock = boto3.Session(region_name="us-west-2").client("bedrock")

# Create application inference profile
response = bedrock.create_inference_profile(
    inferenceProfileName="sonnet-inference-profile",
    modelSource={
        "copyFrom": "arn:aws:bedrock:us-west-2:637423213562:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0"
    },
)

inference_profile_arn = response["inferenceProfileArn"]
print(inference_profile_arn)

arn:aws:bedrock:us-west-2:637423213562:application-inference-profile/hq2of259skzs

For Bedrock's Invoke Model, you can specify the application inference profile as the modelId.

import json

bedrock_runtime = boto3.Session(region_name="us-west-2").client("bedrock-runtime")

response = bedrock_runtime.invoke_model(
    modelId=inference_profile_arn,
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "messages": [
                {
                    "role": "user",
                    "content": "Hello!",
                }
            ],
        }
    ),
)

response_body = json.loads(response.get("body").read())
print(response_body["content"][0]["text"])

However, when using the Anthropic SDK, specifying the application inference profile as the model results in an error.

anthropic = AnthropicBedrock(aws_region="us-west-2")

response = anthropic.messages.create(
    model=inference_profile_arn,
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response)

Message(id=None, content=None, model=None, role=None, stop_reason=None, stop_sequence=None, type=None, usage=None, Output={'__type': 'com.amazon.coral.service#UnknownOperationException'}, Version='1.0')

This is likely because the model parameter is not expecting an ARN to be set.

Please let me know if you have any further questions regarding this.

RobertCraigie commented 2 weeks ago

Thanks for the report, do you know what the expected HTTP path is? What endpoint is .invoke_model() hitting?

moritalous commented 2 weeks ago

I try output debug log.

I hope this helps