aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
151 stars 40 forks source link

ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: "claude-3-sonnet-20240229" is not supported on this API. Please use the Messages API instead. #286

Closed aakash086 closed 5 days ago

aakash086 commented 4 weeks ago

Hi ,

Using the code the evaluate summary using below code

import json import boto3 import os

Bedrock clients for model inference

bedrock_runtime = boto3.client('bedrock-runtime', region_name='eu-west-3')

model_id = "anthropic.claude-3-sonnet-20240229-v1:0" accept = "application/json" contentType = "application/json"

from detectpdc_p_crime import base_prompt, response_pos, response_neg

base_prompt = """ Summarize the below content in half

Answers: """

query_prompt ="five-time world champion michelle kwan withdrew from the #### us figure skating championships on wednesday , but will petition us skating officials for the chance to compete at the #### turin olympics ."

full_prompt = base_prompt + query_prompt

aws_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 200, "messages": [ { "role": "user", "content": [ { "type": "text", "text": full_prompt } ] } ] }

body = json.dumps(aws_body)

Invoke the model

response = bedrock_runtime.invoke_model(body=body, modelId=model_id)

Parse the invocation response

response_body = json.loads(response["body"].read()) outputs = response_body.get("content")

print(outputs)

from fmeval.data_loaders.data_config import DataConfig from fmeval.model_runners.bedrock_model_runner import BedrockModelRunner from fmeval.constants import MIME_TYPE_JSONLINES from fmeval.eval_algorithms.summarization_accuracy import SummarizationAccuracy

config = DataConfig( dataset_name="gigaword_sample", dataset_uri="gigaword_sample.jsonl", dataset_mime_type=MIME_TYPE_JSONLINES, model_input_location="document", target_output_location="summary" )

bedrock_model_runner = BedrockModelRunner( model_id=model_id, output='completion', content_template='{"prompt": $prompt, "max_tokens_to_sample": 500}' )

eval_algo = SummarizationAccuracy() eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config, prompt_template="Human: Summarise the following text in one sentence: $model_input\n\nAssistant:\n", save=True)

print (eval_output)


error

raise error_class(parsed_response, operation_name)

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: "claude-3-sonnet-20240229" is not supported on this API. Please use the Messages API instead.

Though the claude model used to generate summary is using 'invoke_model' to generate the summaries but getting error with fmeval

keerthanvasist commented 2 weeks ago

Hello @aakash086. Thanks for raising this issue.

The issue you are facing is because newer Anthropic models such as Claude 3 require a different format to be used with the Messages API. See here for more details.

To achieve this, you can set appropriate prompt_template parameter in the evaluate function. In the simplest use case, promp_template='[{"role": "user", "content": $model_input}]' could be used.

aakash086 commented 2 weeks ago

Hi @keerthanvasist : Thanks for the reply, I have tried various options using the Messages API and modified the code as below

import json import boto3 import os

bedrock_runtime = boto3.client('bedrock-runtime', region_name='eu-west-3') model_id = "anthropic.claude-3-sonnet-20240229-v1:0" accept = "application/json" contentType = "application/json"

from detectpdc_p_crime import base_prompt, response_pos, response_neg base_prompt = """ Human: Summarize the below content in half

Assistant: """

query_prompt ="five-time world champion michelle kwan withdrew from the #### us figure skating championships on wednesday , but will petition us skating officials for the chance to compete at the #### turin olympics ."

full_prompt = base_prompt + query_prompt

aws_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 200, "messages": [ { "role": "user", "content": [ { "type": "text", "text": full_prompt } ] } ] }

body = json.dumps(aws_body) response = bedrock_runtime.invoke_model(body=body, modelId=model_id) response_body = json.loads(response["body"].read()) outputs = response_body.get("content")

print(outputs)

from fmeval.data_loaders.data_config import DataConfig from fmeval.model_runners.bedrock_model_runner import BedrockModelRunner from fmeval.constants import MIME_TYPE_JSONLINES from fmeval.eval_algorithms.summarization_accuracy import SummarizationAccuracy from fmeval.eval import get_eval_algorithm

config = DataConfig( dataset_name="gigaword_sample", dataset_uri="gigaword_sample.jsonl", dataset_mime_type=MIME_TYPE_JSONLINES, model_input_location="document", target_output_location="summary" )

bedrock_model_runner = BedrockModelRunner( model_id=model_id, output='completion', content_template='{"prompt": $prompt, "max_tokens_to_sample": 500}' )

eval_algo = SummarizationAccuracy() eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config, prompt_template='{"role": "user", "Content": "Summarise the following text in one sentence:"}', save=True)

However still getting error as

raise error_class(parsed_response, operation_name) botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: prompt must start with " Human:" turn after an optional system prompt

To note the gigaword_sample.json file has data as :

{"document": "five-time world champion michelle kwan withdrew from the #### us figure skating championships on wednesday , but will petition us skating officials for the chance to compete at the #### turin olympics .", "summary": "injury leaves kwan 's olympic hopes in limbo", "idx": "0"} {"document": "us business leaders lashed out wednesday at legislation that would penalize companies for employing illegal immigrants .", "summary": "us business attacks tough immigration law", "idx": "1"}

keerthanvasist commented 2 weeks ago

I did take a look at your updated code. Can you try this instead? I have verified that this works.

from fmeval.data_loaders.data_config import DataConfig
from fmeval.model_runners.bedrock_model_runner import BedrockModelRunner
from fmeval.constants import MIME_TYPE_JSONLINES
from fmeval.eval_algorithms.summarization_accuracy import SummarizationAccuracy

config = DataConfig(
    dataset_name="gigaword_sample",
    dataset_uri="/Volumes/workplace/margaret/src/AWSMargaretContainers/gigaword.jsonl",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="document",
    target_output_location="summary"
)

bedrock_model_runner = BedrockModelRunner(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    output='content[0].text',
    content_template='{"anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [{"role": "user","content": [{"type": "text","text": $prompt}]}]}'
)

eval_algo = SummarizationAccuracy()
eval_output = eval_algo.evaluate(
    model=bedrock_model_runner,
    dataset_config=config,
    prompt_template='{"role": "user", "Content": $model_input}',
    save=True
)

The difference is in the two fields content_template, and prompt_template. These two fields dictate how the request payload is constructed.

keerthanvasist commented 5 days ago

@aakash086 I am closing this issue since we didn't hear back from you. Feel free to re-open the issue if you are still running into the errors.