aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Apache License 2.0
9.8k stars 6.67k forks source link

[Llama2 inferentia] : runtime error when invoking endpoint through boto3 #4549

Open krokoko opened 5 months ago

krokoko commented 5 months ago

Link to the notebook

Describe the bug Using a Lambda function with boto3 to query the neuron llama2 7b f model deployed on a ML INF2 XLARGE instance, the invoke endpoint operation fails with the following message:

  "errorMessage": "An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message \"{\n  \"code\": 400,\n  \"type\": \"BadRequestException\",\n  \"message\": \"Parameter model_name is required.\"\n}\n\". See in account XXXXXXX for more information.",
  "errorType": "ModelError",
  "requestId": "2f2a7aa4-9eeb-42f5-9a14-6285894581bb",
  "stackTrace": [
    "  File \"/var/task/\", line 19, in handler\n    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,\n",
    "  File \"/var/runtime/botocore/\", line 530, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n",
    "  File \"/var/runtime/botocore/\", line 960, in _make_api_call\n    raise error_class(parsed_response, operation_name)\n"

The model configuration is as follow:

To reproduce

import boto3
import json

def handler(event, context):
    runtime= boto3.client('runtime.sagemaker')

    ENDPOINT_NAME = 'testllamaneuron'

    dic = {
     "inputs": [
       {"role": "system", "content": "You are chat bot who writes songs"},
       {"role": "user", "content": "Write a rap song about Amazon Web Services"}
     "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}

    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,

    result = json.loads(response['Body'].read().decode())

    return {
        "statusCode": 200,
        "body": json.dumps(result)


Lambda Function logs:

[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "BadRequestException",
  "message": "Parameter model_name is required."