KraftZzz commented 1 year ago

model_data = 's3://kraft-source-bucket/huggingface_model/model.tar.gz'

from sagemaker.huggingface import HuggingFaceModel from sagemaker_ssh_helper.wrapper import SSHModelWrapper import sagemaker

create Hugging Face Model Class

huggingface_model = HuggingFaceModel( transformers_version='4.17.0', pytorch_version='1.10.2', py_version='py38', dependencies=[SSHModelWrapper.dependency_dir()], model_data=model_data, role=role ) ssh_wrapper = SSHModelWrapper.create(huggingface_model, connection_wait_time_seconds=0) huggingface_model.deploy(initial_instance_count=1,instance_type="ml.g4dn.xlarge",wait=False)

model.tar.gz

pytorch.xx.bin
code/
- inference.py

cat inference.py import argparse import io import json import logging import os import sys import subprocess import torch import torch.distributed as dist import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import torch.utils.data import torch.utils.data.distributed from PIL import Image from torchvision import datasets, transforms from torchvision.transforms import ToTensor from model import Net logger = logging.getLogger(name) logger.addHandler(logging.StreamHandler(sys.stdout)) logger.info(os.system("nvidia-smi")) sys.path.append(os.path.join(os.path.dirname(file), "lib"))

import sagemaker_ssh_helper sagemaker_ssh_helper.setup_and_start_ssh()

def model_fn(model_dir): print(model_dir) logger.info(model_dir) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = torch.nn.DataParallel(Net()) with open(os.path.join(model_dir, "model.pth"), "rb") as f: model.load_state_dict(torch.load(f)) return model.to(device)

def load_from_bytearray(request_body): image_as_bytes = io.BytesIO(request_body) image = Image.open(image_as_bytes) image_tensor = ToTensor()(image).unsqueeze(0) return image_tensor

def input_fn(request_body, request_content_type):

if set content_type as 'image/jpg' or 'applicaiton/x-npy',

# the input is also a python bytearray
if request_content_type == "application/x-image":
    image_tensor = load_from_bytearray(request_body)
else:
    print("not support this type yet")
    raise ValueError("not support this type yet")
return image_tensor

Perform prediction on the deserialized object, with the loaded model

def predict_fn(input_object, model): output = model.forward(input_object) pred = output.max(1, keepdim=True)[1]

return {"predictions": pred.item()}

Serialize the prediction result into the desired response content type

def output_fn(predictions, response_content_type): return json.dumps(predictions)

I run this code: instance_ids = ssh_wrapper.get_instance_ids() # <--NEW-- print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]}')

no any output and in cloudwatch log has no any related info about sagemaker-ssh-helper

ivan-khvostishkov commented 1 year ago

Hi, @KraftZzz. Thanks you for raising this issue.

I have few questions to clarify:

1/ You cannot use ssh / ssm, but do you get any prediction results?

2/ If you don't see any logs in CloudWatch, you probably have misconfigured permissions (no access to CloudWatch). Can you try any of the SageMaker examples, e.g. Deploy a pretrained PyTorch BERT model from Hugging Face and confirm that this example is working in your environment and that you can see the logs for it?

If you have the same issue even if you don't use SageMaker SSH Helper, you might need to reach out to the AWS Support.

KraftZzz commented 1 year ago

I can got the prediction results and endpoint is inService status.
I am sure that the permission is not bad and I can see the MMS(multi-model-server) launch log, but no any ssm info or some infos about ssh-helper. And I checked the examples(Deploy a pretrained PyTorch BERT model from Hugging Face) you provided, in this example, using Pytorch Model and is a single model, no multi model.
I refer this example:https://github.com/huggingface/notebooks/tree/main/sagemaker/17_custom_inference_script, could you please help me to confirm whether is available in this example

ivan-khvostishkov commented 1 year ago

OK, got it. Could you try instead of dependencies parameter add requirements.txt to the code/?

In the requirements add SageMaker SSH Helper :

sagemaker-ssh-helper

I've tried your example and with it works for me with this approach.

As a side comment, all examples including your code are single-model endpoints. "Multi-model-server" name is somewhat confusing. If you really want to deploy a multi-model endpoint, you will need to use MultiDataModel and SSHMultiModelWrapper. See the FAQ for more details.

KraftZzz commented 1 year ago

You add this code: import os import sys sys.path.append(os.path.join(os.path.dirname(file), "lib"))

import sagemaker_ssh_helper sagemaker_ssh_helper.setup_and_start_ssh() in inference.py, right? I mention MMS because I see the following information in the endpoint log:

Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport

2023-04-26T04:35:25,060 [INFO ] main com.amazonaws.ml.mms.ModelServer - MMS Home: /opt/conda/lib/python3.8/site-packages Current directory: / Temp directory: /home/model-server/tmp Number of GPUs: 1 Number of CPUs: 4 Max heap size: 3500 M Python executable: /opt/conda/bin/python3.8 Config file: /etc/sagemaker-mms.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8080 Model Store: /.sagemaker/mms/models Initial Models: ALL Log dir: null Metrics dir: null Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Preload model: false Prefer direct buffer: false 2023-04-26T04:35:25,118 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model 2023-04-26T04:35:25,179 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 72 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started. 2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.8.10 2023-04-26T04:35:25,181 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded. 2023-04-26T04:35:25,187 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2023-04-26T04:35:25,199 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000 2023-04-26T04:35:25,256 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080 Model server started.

KraftZzz commented 1 year ago

no any sagemaker-ssh-helper worklog in endpoint cloudwatch log, so I run: instance_ids = ssh_wrapper.get_instance_ids() print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]} --region {sess.boto_region_name}') no any output

KraftZzz commented 1 year ago

Could you please share your step ?

ivan-khvostishkov commented 1 year ago

My steps are the following:

1/ Added to inference.py the following lines:

+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__), "lib"))
+
+import sagemaker_ssh_helper
+sagemaker_ssh_helper.setup_and_start_ssh()
+
+
 from transformers import AutoTokenizer, AutoModel
 import torch
 import torch.nn.functional as F

2/ Modified in sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb and executed the following cell:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker_ssh_helper.wrapper import SSHModelWrapper  # <--NEW--

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_location,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.26",  # transformers version used
   pytorch_version="1.13",        # pytorch version used
   py_version='py39',            # python version used
)

ssh_wrapper = SSHModelWrapper.create(huggingface_model, connection_wait_time_seconds=0)  # <--NEW--

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge"
    )

After endpoint has been deployed, I was able to fetch the instance_ids():

ssh_wrapper.get_instance_ids()

INFO:sagemaker-ssh-helper:Querying SSM instance IDs for endpoint huggingface-pytorch-inference-2023-04-24-17-00-23-155
INFO:sagemaker-ssh-helper:Got preliminary SSM instance IDs: ['mi-01234567890abcd00']
INFO:sagemaker-ssh-helper:Got final SSM instance IDs: ['mi-01234567890abcd00']

['mi-01234567890abcd00']

KraftZzz commented 1 year ago

Oh, To my confusion, I managed to get the Mi-xxxx in one of my experiments yesterday. But I don't modify any code...

KraftZzz commented 1 year ago

Thanks for your share

ivan-khvostishkov commented 1 year ago

You're welcome! Let me know if you managed to make your code work, so we can close this issue.

aws-samples / sagemaker-ssh-helper

[Issue] When use the MMS host model. e.g. HuggingFace Model, no any info in cloudwatch log and can not use ssh #22

create Hugging Face Model Class

if set content_type as 'image/jpg' or 'applicaiton/x-npy',

Perform prediction on the deserialized object, with the loaded model

Serialize the prediction result into the desired response content type

Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport