Serverless Inference: By default, upon endpoint deployment, SageMaker will provision EC2 instances (managed by SageMaker) for hosting purposes. To shield customers from the complexity of forecasting fleet sizes, the ServerlessConfig attribute was added to the ProductionVariant CloudFormation structure of an endpoint config resource. This configuration removes the need for customers to specify instance-specific settings (e.g., instance count, instance type), abstracting the runtime compute from customers, much in the same way Lambda does for its customers.
Please 👍 this issue to help with the prioritization of this feature.
Use Case
"Amazon SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic." (link)
In preparation for the addition of this feature into the CDK, all concrete production variant related classes and attributes have been prefixed with the string [Ii]nstance to designate that they are only associated with instance-based hosting. When later adding serverless support to the SageMaker module, [Ss]erverless-prefixed analogs can be created with attributes appropriate for the use-case with appropriate plumbing to the L1 constructs. Note, there are a number of features which do not yet work with serverless variants, so it may be necessary to incorporate a number of new synthesis-time checks or compile-time contracts to guard against mixing incompatible features. For example, as discussed with the bar raiser, alongside the proposed EndpointConfigProps attribute instanceProductionVariants?: InstanceProductionVariantProps[], a new mutually exclusive attribute serverlessProductionVariant?: ServerlessProductionVariantProps (as only a single variant is supported with serverless inference) could be added with a synthesis-time check confirming that the customer hasn't configured both instance-based and serverless production variants.
Other Information
No response
Acknowledgements
[X] I may be able to implement this feature request
This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.
Describe the feature
As described in the SageMaker
Endpoint
L2 construct RFC:Please 👍 this issue to help with the prioritization of this feature.
Use Case
"Amazon SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic." (link)
Proposed Solution
As described in the SageMaker
Endpoint
L2 construct RFC:Other Information
No response
Acknowledgements
CDK version used
2.54.0-alpha.0
Environment details (OS name and version, etc.)
macOS Ventura