aws / sagemaker-pytorch-inference-toolkit

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
134 stars 70 forks source link

Serving a model using custom container, instance run of disk #112

Open HamidShojanazeri opened 2 years ago

HamidShojanazeri commented 2 years ago

Describe the bug Using a custom container to serve a Pytorch model, defined as below, it throw "No space left on device"

container = {"Image": image, "ModelDataUrl": model_artifact}

create_model_response = sm.create_model(
    ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": "ml.g4dn.8xlarge",
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

Docker image size is 17 GB and Torchserve mar file is 8 GB. I was wondering if there is any way to increase the storage for the instances that are serving the model. Going through the doc for endpoint configuration seems there is no setting for specifics about instances.

-- Cloud watch log

256717956_890382124957120_3900367258239977898_n

Expected behavior

Having knobs to set the storage for the serving instances.

HamidShojanazeri commented 2 years ago

cc @nskool

HamidShojanazeri commented 2 years ago

I believe exposing few knobs for some of the settings including storage for the host instances would be helpful. Thanks @lxning for the offline discussions, it would be great if could add this as a feature to Sagemaker SDK.

lxning commented 2 years ago

According to SM hosting team, currently SM SDK does not support storage size configuration. The only available solution is to change instance type. Pls refer host-instance-storage-volumes-table

HamidShojanazeri commented 2 years ago

@lxning this is a limiting factor, as it is easy to hit the limit mostly on gpu instance 30GB, some of Nvidia dockers similar in this case can go up to 21 GB and heavier workloads that chain multiple models can end up having a large model_artifact size that goes beyond the limit.