Error running multimodel endpoints in sagemaker

Description

Multi model endpoint deployment in sagemaker through DJL serving is supposed to be supported. Here is the related AWS page and associated tutorial. I tried to run the code in the demo as it is. The endpoint gets created, but getting errors when trying to invoke the endpoint.

Expected Behavior

Successful invocation of endpoint based on target model name in multimodel endpoint scenario.

Error Message

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 print(predictor.predict( {"prompt": "Large model inference is"}, target_model="opt-350m.tar.gz"))
      2 print(predictor.predict({"prompt": "Large model inference is"}, target_model="bloomz-560m.tar.gz"))
      3 print(predictor.predict({"prompt": "Large model inference is"}, target_model="gpt-neo-125m.tar.gz"))

File /opt/conda/lib/python3.10/site-packages/sagemaker/base_predictor.py:212, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes, component_name)
    209 if inference_component_name:
    210     request_args["InferenceComponentName"] = inference_component_name
--> 212 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    213 return self._handle_response(response)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    561     raise TypeError(
    562         f"{py_operation_name}() only accepts keyword arguments."
    563     )
    564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
   1017     error_code = error_info.get("QueryErrorCode") or error_info.get(
   1018         "Code"
   1019     )
   1020     error_class = self.exceptions.from_code(error_code)
-> 1021     raise error_class(parsed_response, operation_name)
   1022 else:
   1023     return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (404) from primary with message "{
  "code": 404,
  "type": "ModelNotFoundException",
  "message": "Failed to detect engine of the model: /opt/ml/models/5ce62ebae83b18f9141573e424631f2e/model/temp"
}
". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/lmi-model-2024-05-15-10-10-58-392 in account 662744937784 for more information.

How to Reproduce?

Execute the given demo code at here.

Steps to reproduce

(Paste the commands you ran that produced the error.)

Run the code at https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/Multi-Model-Inference-Demo.ipynb.

What have you tried to solve it?

Tried to instead use models stored in s3 buckets. Gave same error. These models can be successfully deployed in their own endpoints through DJL-serving but not in the multimodel scenario.

deepjavalibrary / djl-serving

Error running multimodel endpoints in sagemaker #1911