aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.57k stars 3.88k forks source link

sagemaker: Support serverless variants for endpoints #23148

Open petermeansrock opened 1 year ago

petermeansrock commented 1 year ago

Describe the feature

As described in the SageMaker Endpoint L2 construct RFC:

Serverless Inference: By default, upon endpoint deployment, SageMaker will provision EC2 instances (managed by SageMaker) for hosting purposes. To shield customers from the complexity of forecasting fleet sizes, the ServerlessConfig attribute was added to the ProductionVariant CloudFormation structure of an endpoint config resource. This configuration removes the need for customers to specify instance-specific settings (e.g., instance count, instance type), abstracting the runtime compute from customers, much in the same way Lambda does for its customers.

Please 👍 this issue to help with the prioritization of this feature.

Use Case

"Amazon SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic." (link)

Proposed Solution

As described in the SageMaker Endpoint L2 construct RFC:

In preparation for the addition of this feature into the CDK, all concrete production variant related classes and attributes have been prefixed with the string [Ii]nstance to designate that they are only associated with instance-based hosting. When later adding serverless support to the SageMaker module, [Ss]erverless-prefixed analogs can be created with attributes appropriate for the use-case with appropriate plumbing to the L1 constructs. Note, there are a number of features which do not yet work with serverless variants, so it may be necessary to incorporate a number of new synthesis-time checks or compile-time contracts to guard against mixing incompatible features. For example, as discussed with the bar raiser, alongside the proposed EndpointConfigProps attribute instanceProductionVariants?: InstanceProductionVariantProps[], a new mutually exclusive attribute serverlessProductionVariant?: ServerlessProductionVariantProps (as only a single variant is supported with serverless inference) could be added with a synthesis-time check confirming that the customer hasn't configured both instance-based and serverless production variants.

Other Information

No response

Acknowledgements

CDK version used

2.54.0-alpha.0

Environment details (OS name and version, etc.)

macOS Ventura

github-actions[bot] commented 7 months ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.