aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.64k stars 3.91k forks source link

sagemaker: Support direct invocation of multi-container endpoints #23155

Open petermeansrock opened 1 year ago

petermeansrock commented 1 year ago

Describe the feature

As described in the SageMaker Endpoint L2 construct RFC:

Direct Invocation of Multi-Container Endpoints: By default (and as described in the proposed README), when a customer specifies multiple containers for a model, the containers are treated as an inference pipeline (also referred to as a serial pipeline). This means that the containers are treated as an ordered list, wherein the output of one container at runtime is passed as input to the next. Only the output from the last container is surfaced to the client invoking the model. To support a different invocation paradigm, the InferenceExecutionConfig structure was added to the model CloudFormation resource which allows customers to either explicitly configure Serial invocation mode (the default, as an inference pipeline) or the new Direct invocation mode. When using direct mode, a client invoking an endpoint must specify a container to target with their request; SageMaker then invokes only that single container.

Please 👍 this issue to help with the prioritization of this feature.

Use Case

"This enables you to run up to 15 different ML containers on a single endpoint and invoke them independently, thereby saving up to 90% in costs. These ML containers can be running completely different ML frameworks and algorithms for model serving." (link)

Proposed Solution

As described in the SageMaker Endpoint L2 construct RFC:

As SageMaker exposes a new dimension for CloudWatch metrics specific to each directly-invokable container, other than exposing a new inference execution mode attribute on the Model construct, this feature would likely also warrant the addition of a findContainer(containerHostName: string) method to IEndpointProductionVariant which will return a new interface on which additional metric* APIs are present for generating CloudWatch metrics against the dimension consisting of endpoint, variant, and container combined.

Other Information

No response

Acknowledgements

CDK version used

2.54.0-alpha.0

Environment details (OS name and version, etc.)

macOS Ventura

peterwoodworth commented 1 year ago

Thanks for documenting all the steps to fleshing out this module in individual issues @petermeansrock! We'll need to depend on contributor support to clear all these out, and this will be a big help in gaining that support