aws-samples / host-yolov8-on-sagemaker-endpoint

MIT No Attribution
35 stars 24 forks source link

stdout MODEL_LOG - FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/code/best.pt' #20

Open RahulJana opened 3 months ago

RahulJana commented 3 months ago

I am trying to deploy a yoloV5 model in SageMaker following this notebook. The endpoint is successfully deployed but when I am trying to test the endpoint using predictor.predict(payload) it is showing this error:


ModelError Traceback (most recent call last) Cell In[196], line 1 ----> 1 result = predictor.predict(payload)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/sagemaker/base_predictor.py:212, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes, component_name) 209 if inference_component_name: 210 request_args["InferenceComponentName"] = inference_component_name --> 212 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args) 213 return self._handle_response(response)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs) 561 raise TypeError( 562 f"{py_operation_name}() only accepts keyword arguments." 563 ) 564 # The "self" in this scope is referring to the BaseClient. --> 565 return self._make_api_call(operation_name, kwargs)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params) 1017 error_code = error_info.get("QueryErrorCode") or error_info.get( 1018 "Code" 1019 ) 1020 error_class = self.exceptions.from_code(error_code) -> 1021 raise error_class(parsed_response, operation_name) 1022 else: 1023 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch log stream`


When I looked into the error logs, it is showing: stdout MODEL_LOG - FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/code/best.pt'

The file structure while creating the tar file is:

model.tar.gz ├─ code/ ├── inference.py ├── requirements.txt └── best.pt

I have even tried it with best.pt out side the code folder according to this article: https://aws.amazon.com/blogs/machine-learning/hosting-yolov8-pytorch-model-on-amazon-sagemaker-endpoints/ model.tar.gz ├─ code/ │ ├── inference.py │ └── requirements.txt └── best.pt

still faced same issue.