Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
131
stars
70
forks
source link
MMS mode in inference does not support in GPU instance #129
I created the image using 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker, but I cannot deploy with MMS mode for GPU instance. ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: MultiModel mode is not supported for instance type ml.g4dn.xlarge.
from here, it said GPU instance is not supported https://github.com/aws/sagemaker-python-sdk/issues/1323
So why does the Pytorch GPU prebuilt image uses MMS as the model server, while the inference endpoint does not support it?
I created the image using 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker, but I cannot deploy with MMS mode for GPU instance. ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: MultiModel mode is not supported for instance type ml.g4dn.xlarge. from here, it said GPU instance is not supported https://github.com/aws/sagemaker-python-sdk/issues/1323
So why does the Pytorch GPU prebuilt image uses MMS as the model server, while the inference endpoint does not support it?