Using the application inference profile in Bedrock results in failed model invocations.

moritalous commented 3 weeks ago

Amazon Bedrock has added a new feature called "application inference profiles".

Using application inference profiles is like adding an alias to a base model.

Creating an application inference profile

import boto3

bedrock = boto3.Session(region_name="us-west-2").client("bedrock")

# Create application inference profile
response = bedrock.create_inference_profile(
    inferenceProfileName="sonnet-inference-profile",
    modelSource={
        "copyFrom": "arn:aws:bedrock:us-west-2:637423213562:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0"
    },
)

inference_profile_arn = response["inferenceProfileArn"]
print(inference_profile_arn)

arn:aws:bedrock:us-west-2:637423213562:application-inference-profile/hq2of259skzs

For Bedrock's Invoke Model, you can specify the application inference profile as the modelId.

import json

bedrock_runtime = boto3.Session(region_name="us-west-2").client("bedrock-runtime")

response = bedrock_runtime.invoke_model(
    modelId=inference_profile_arn,
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "messages": [
                {
                    "role": "user",
                    "content": "Hello!",
                }
            ],
        }
    ),
)

response_body = json.loads(response.get("body").read())
print(response_body["content"][0]["text"])

However, when using the Anthropic SDK, specifying the application inference profile as the model results in an error.

anthropic = AnthropicBedrock(aws_region="us-west-2")

response = anthropic.messages.create(
    model=inference_profile_arn,
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response)

Message(id=None, content=None, model=None, role=None, stop_reason=None, stop_sequence=None, type=None, usage=None, Output={'__type': 'com.amazon.coral.service#UnknownOperationException'}, Version='1.0')

This is likely because the model parameter is not expecting an ARN to be set.

Please let me know if you have any further questions regarding this.

RobertCraigie commented 2 weeks ago

Thanks for the report, do you know what the expected HTTP path is? What endpoint is .invoke_model() hitting?

moritalous commented 2 weeks ago

I try output debug log.

use boto3

response = bedrock_runtime.invoke_model(
  modelId=inference_profile_arn,
  body=json.dumps(
      {
          "anthropic_version": "bedrock-2023-05-31",
          "max_tokens": 1000,
          "messages": [
              {
                  "role": "user",
                  "content": "Hello!",
              }
          ],
      }
  ),
)

2024-11-08 15:51:17,031 botocore.auth [DEBUG] CanonicalRequest:
POST
/model/arn%253Aaws%253Abedrock%253Aus-west-2%253A637423213562%253Aapplication-inference-profile%252Fhq2of259skzs/invoke

host:bedrock-runtime.us-west-2.amazonaws.com
x-amz-date:20241108T155117Z
x-amz-security-token:**********
host;x-amz-date;x-amz-security-token
dec27b832eccd8d562578f99b60945183245f8876193a23d309e951df15eaab9

use Anthropic SDK

response = anthropic.messages.create(
  model=inference_profile_arn,
  max_tokens=1024,
  messages=[{"role": "user", "content": "Hello!"}],
)

2024-11-08 15:55:34,077 botocore.auth [DEBUG] CanonicalRequest:
POST
/model/arn%3Aaws%3Abedrock%3Aus-west-2%3A637423213562%3Aapplication-inference-profile/hq2of259skzs/invoke

accept:application/json
accept-encoding:gzip, deflate
content-length:116
content-type:application/json
host:bedrock-runtime.us-west-2.amazonaws.com
x-amz-date:20241108T155534Z
x-amz-security-token:*****
x-stainless-arch:x64
x-stainless-lang:python
x-stainless-os:Linux
x-stainless-package-version:0.39.0
x-stainless-retry-count:0
x-stainless-runtime:CPython
x-stainless-runtime-version:3.10.12

accept;accept-encoding;content-length;content-type;host;x-amz-date;x-amz-security-token;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version
33fcea8bfcee2180d557bf027b19a7e6b4394deee5c905ef5afe381afd0b4d83

I hope this helps

anthropics / anthropic-sdk-python

Using the application inference profile in Bedrock results in failed model invocations. #740