model.deploy to allow for auto scale configuration

aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker

https://sagemaker.readthedocs.io/

Apache License 2.0

2.11k stars 1.14k forks source link

model.deploy to allow for auto scale configuration #1880

Open allcentury opened 4 years ago

allcentury commented 4 years ago

Describe the feature you'd like

today we deploy a model like so:

model = SKLearn(
    entry_point=script_path,
    framework_version="0.20.0",
    py_version="py3",
    instance_type="ml.m5.2xlarge",
    role=role,
    sagemaker_session=sagemaker_session,
    dependencies=[...],
)

predictor = model.deploy(
    endpoint_name="some_name", 
    initial_instance_count=1, 
    instance_type="ml.m5.large",
    predictor_cls=SKLearnPredictorJson,
)

How would this feature be used? Please describe.

When calling model.deploy it would be ideal if there was a way to set an autoscale policy (similar to how we can set initial_instance_count).

Describe alternatives you've considered

I'm still researching if I can use SKLearn class while also using boto3 to attach a policy.

ajaykarpur commented 4 years ago

Hi @allcentury, thanks for the recommendation. We'll add this to our backlog. In the meantime, you can use boto3 to attach the scaling policy:

allcentury commented 3 years ago

For those new to this like I was, here is what I had to do:

client = boto3.client('application-autoscaling')

client.register_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId="endpoint/" + endpoint_name + "/variant/AllTraffic",
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=4,
    MaxCapacity=50,
    RoleARN=role,
    SuspendedState={
        'DynamicScalingInSuspended': False,
        'DynamicScalingOutSuspended': False,
        'ScheduledScalingSuspended': False
    }
)

# check the target is available
client.describe_scalable_targets(
    ServiceNamespace='sagemaker',
    MaxResults=123,
)

client.put_scaling_policy(
    PolicyName='autoscale-policy',
    ServiceNamespace='sagemaker',
    ResourceId="endpoint/" + endpoint_name + "/variant/AllTraffic",
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 150.0,
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance',
        },
        'ScaleOutCooldown': 300,
        'ScaleInCooldown': 300,
    }
)