Open mwm5945 opened 10 months ago
/cc @surajkota @aws-controllers-k8s/sagemaker-maintainer
Hi @mwm5945, thanks for opening the issue. We are aware that SageMaker controller does not support this functionality and part of it is by design given the nature of K8s controller. The UpdateEndpointWeightsAndCapacities
API supports updating weight and instance count (or concurrency parameters incase of serverless endpoint) for an arbitrary variant. Lets take case of real time endpoint
UpdateEndpointWeightsAndCapacities
API or by setting up autoscaling on the variant. K8s controller cannot differentiate between these two events. The controller will try to adjust the instance count as specified in the spec which can lead to unintended behavior if the variant has autoscaling configured. The same behavior can be achieved by using minCapacity
of Application autoscaling, e.g. autoscaling spec which can be used as an alternative and provides more configurations for production usecasesDesiredVariantWeight
is a update only parameter and the shapes in describe and update API vary significantly making it hard to maintain. It is actually a better approach maintain one endpoint config per endpoint. This will have 2 benefits, 1/ You have a safer way to determine if an endpoint config can be deleted, it will save you from cases where a different endpoint was using the config and the config gets accidentally deleted , 2/ you can use it to adjust the weights of the variant and maintain that configuration in one place.
Summarize - Use autoscaling to adjust desired instance count and endpointConfig to adjust variant weight. Let us know if this works for your usecase.
Thanks
i see, thanks @surajkota--though one use case is for something like a scare instance type (i.e. p4d), where creating new instances may be difficult/impossible, just to update the weights :/
somewhat related, i've created a new issue: https://github.com/aws-controllers-k8s/community/issues/1889
Synced offline to understand the priority. Will keep this issue open incase the workaround for updating desired weights creates operational complexity and if there are other users who are impacted by this.
Thanks
Issues go stale after 180d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 60d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale
/remove-lifecycle stale
Is your feature request related to a problem? The Sagemaker controller doesn't currently support updating weights and capacities (
UpdateEndpointWeightsAndCapacities
API). Currently, to update these values, users will need to create a new endpoint config, create it, then update the endpoint to use the new config.Describe the solution you'd like Ideally, updating the Endpoint would take care of this, new fields may be required to allow for this API to used when the Endpoint spec is changed. Updating the EndpointConfig may not be the best option as multiple Endpoints could be using the same config.
Describe alternatives you've considered The only available option at this time is to create new EndpointConfigs, and update the endpoint to use that config. This leads to extraneous EndpointConfigs hanging around if not cleaned up, as well as a less than ideal user experience when it comes to A/B testing of models.