Closed KraftZzz closed 1 year ago
Hi, @KraftZzz. Thanks you for raising this issue.
I have few questions to clarify:
1/ You cannot use ssh / ssm, but do you get any prediction results?
2/ If you don't see any logs in CloudWatch, you probably have misconfigured permissions (no access to CloudWatch). Can you try any of the SageMaker examples, e.g. Deploy a pretrained PyTorch BERT model from Hugging Face and confirm that this example is working in your environment and that you can see the logs for it?
If you have the same issue even if you don't use SageMaker SSH Helper, you might need to reach out to the AWS Support.
OK, got it. Could you try instead of dependencies
parameter add requirements.txt
to the code/
?
In the requirements add SageMaker SSH Helper :
sagemaker-ssh-helper
I've tried your example and with it works for me with this approach.
As a side comment, all examples including your code are single-model endpoints. "Multi-model-server" name is somewhat confusing. If you really want to deploy a multi-model endpoint, you will need to use MultiDataModel
and SSHMultiModelWrapper
. See the FAQ for more details.
You add this code: import os import sys sys.path.append(os.path.join(os.path.dirname(file), "lib"))
import sagemaker_ssh_helper sagemaker_ssh_helper.setup_and_start_ssh() in inference.py, right? I mention MMS because I see the following information in the endpoint log:
2023-04-26T04:35:25,060 [INFO ] main com.amazonaws.ml.mms.ModelServer - MMS Home: /opt/conda/lib/python3.8/site-packages Current directory: / Temp directory: /home/model-server/tmp Number of GPUs: 1 Number of CPUs: 4 Max heap size: 3500 M Python executable: /opt/conda/bin/python3.8 Config file: /etc/sagemaker-mms.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8080 Model Store: /.sagemaker/mms/models Initial Models: ALL Log dir: null Metrics dir: null Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Preload model: false Prefer direct buffer: false 2023-04-26T04:35:25,118 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model 2023-04-26T04:35:25,179 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 72 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started. 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.8.10 2023-04-26T04:35:25,181 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded. 2023-04-26T04:35:25,187 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2023-04-26T04:35:25,199 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000 2023-04-26T04:35:25,256 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080 Model server started.
no any sagemaker-ssh-helper worklog in endpoint cloudwatch log, so I run: instance_ids = ssh_wrapper.get_instance_ids() print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]} --region {sess.boto_region_name}') no any output
Could you please share your step ?
My steps are the following:
1/ Added to inference.py the following lines:
+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__), "lib"))
+
+import sagemaker_ssh_helper
+sagemaker_ssh_helper.setup_and_start_ssh()
+
+
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
2/ Modified in sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb and executed the following cell:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker_ssh_helper.wrapper import SSHModelWrapper # <--NEW--
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=s3_location, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.26", # transformers version used
pytorch_version="1.13", # pytorch version used
py_version='py39', # python version used
)
ssh_wrapper = SSHModelWrapper.create(huggingface_model, connection_wait_time_seconds=0) # <--NEW--
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge"
)
After endpoint has been deployed, I was able to fetch the instance_ids():
ssh_wrapper.get_instance_ids()
INFO:sagemaker-ssh-helper:Querying SSM instance IDs for endpoint huggingface-pytorch-inference-2023-04-24-17-00-23-155
INFO:sagemaker-ssh-helper:Got preliminary SSM instance IDs: ['mi-01234567890abcd00']
INFO:sagemaker-ssh-helper:Got final SSM instance IDs: ['mi-01234567890abcd00']
['mi-01234567890abcd00']
Oh, To my confusion, I managed to get the Mi-xxxx in one of my experiments yesterday. But I don't modify any code...
Thanks for your share
You're welcome! Let me know if you managed to make your code work, so we can close this issue.
model_data = 's3://kraft-source-bucket/huggingface_model/model.tar.gz'
from sagemaker.huggingface import HuggingFaceModel from sagemaker_ssh_helper.wrapper import SSHModelWrapper import sagemaker
create Hugging Face Model Class
huggingface_model = HuggingFaceModel( transformers_version='4.17.0', pytorch_version='1.10.2', py_version='py38', dependencies=[SSHModelWrapper.dependency_dir()], model_data=model_data, role=role ) ssh_wrapper = SSHModelWrapper.create(huggingface_model, connection_wait_time_seconds=0) huggingface_model.deploy(initial_instance_count=1,instance_type="ml.g4dn.xlarge",wait=False)
model.tar.gz
cat inference.py import argparse import io import json import logging import os import sys import subprocess import torch import torch.distributed as dist import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import torch.utils.data import torch.utils.data.distributed from PIL import Image from torchvision import datasets, transforms from torchvision.transforms import ToTensor from model import Net logger = logging.getLogger(name) logger.addHandler(logging.StreamHandler(sys.stdout)) logger.info(os.system("nvidia-smi")) sys.path.append(os.path.join(os.path.dirname(file), "lib"))
import sagemaker_ssh_helper sagemaker_ssh_helper.setup_and_start_ssh()
def model_fn(model_dir): print(model_dir) logger.info(model_dir) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = torch.nn.DataParallel(Net()) with open(os.path.join(model_dir, "model.pth"), "rb") as f: model.load_state_dict(torch.load(f)) return model.to(device)
def load_from_bytearray(request_body): image_as_bytes = io.BytesIO(request_body) image = Image.open(image_as_bytes) image_tensor = ToTensor()(image).unsqueeze(0) return image_tensor
def input_fn(request_body, request_content_type):
if set content_type as 'image/jpg' or 'applicaiton/x-npy',
Perform prediction on the deserialized object, with the loaded model
def predict_fn(input_object, model): output = model.forward(input_object) pred = output.max(1, keepdim=True)[1]
Serialize the prediction result into the desired response content type
def output_fn(predictions, response_content_type): return json.dumps(predictions)
I run this code: instance_ids = ssh_wrapper.get_instance_ids() # <--NEW-- print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]}')
no any output and in cloudwatch log has no any related info about sagemaker-ssh-helper