aws / sagemaker-pytorch-inference-toolkit

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
131 stars 70 forks source link

MMS mode in inference does not support in GPU instance #129

Closed holopekochan closed 1 year ago

holopekochan commented 1 year ago

I created the image using 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker, but I cannot deploy with MMS mode for GPU instance. ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: MultiModel mode is not supported for instance type ml.g4dn.xlarge. from here, it said GPU instance is not supported https://github.com/aws/sagemaker-python-sdk/issues/1323

So why does the Pytorch GPU prebuilt image uses MMS as the model server, while the inference endpoint does not support it?

response = sagemaker_client.create_endpoint_config(
                EndpointConfigName = 'MultiModelConfig',
                ProductionVariants=[
                     {
                        'InstanceType':        'ml.g4dn.xlarge',
                        'InitialInstanceCount': 1,
                        'InitialVariantWeight': 1,
                        'ModelName':            'MultiModel',
                        'VariantName':          'AllTraffic'
                      }
                ]
           )
print(response)