Multi model endpoint deployment in sagemaker through DJL serving is supposed to be supported. Here is the related AWS page and associated tutorial. I tried to run the code in the demo as it is. The endpoint gets created, but getting errors when trying to invoke the endpoint.
Expected Behavior
Successful invocation of endpoint based on target model name in multimodel endpoint scenario.
Error Message
---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
Cell In[14], line 1
----> 1 print(predictor.predict( {"prompt": "Large model inference is"}, target_model="opt-350m.tar.gz"))
2 print(predictor.predict({"prompt": "Large model inference is"}, target_model="bloomz-560m.tar.gz"))
3 print(predictor.predict({"prompt": "Large model inference is"}, target_model="gpt-neo-125m.tar.gz"))
File /opt/conda/lib/python3.10/site-packages/sagemaker/base_predictor.py:212, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes, component_name)
209 if inference_component_name:
210 request_args["InferenceComponentName"] = inference_component_name
--> 212 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
213 return self._handle_response(response)
File /opt/conda/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
561 raise TypeError(
562 f"{py_operation_name}() only accepts keyword arguments."
563 )
564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)
File /opt/conda/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
1017 error_code = error_info.get("QueryErrorCode") or error_info.get(
1018 "Code"
1019 )
1020 error_class = self.exceptions.from_code(error_code)
-> 1021 raise error_class(parsed_response, operation_name)
1022 else:
1023 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (404) from primary with message "{
"code": 404,
"type": "ModelNotFoundException",
"message": "Failed to detect engine of the model: /opt/ml/models/5ce62ebae83b18f9141573e424631f2e/model/temp"
}
". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/lmi-model-2024-05-15-10-10-58-392 in account 662744937784 for more information.
Tried to instead use models stored in s3 buckets. Gave same error. These models can be successfully deployed in their own endpoints through DJL-serving but not in the multimodel scenario.
Description
Multi model endpoint deployment in sagemaker through DJL serving is supposed to be supported. Here is the related AWS page and associated tutorial. I tried to run the code in the demo as it is. The endpoint gets created, but getting errors when trying to invoke the endpoint.
Expected Behavior
Successful invocation of endpoint based on target model name in multimodel endpoint scenario.
Error Message
How to Reproduce?
Execute the given demo code at here.
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?