Closed arnaudstiegler closed 2 years ago
SageMaker Platform features like Serverless Inference are not directly built into the toolkit. For Multimodel it should be the same except that we make sure the if a inference.py
is provided it is used.
Thanks for the answer, let me maybe clarify my question: I have a custom container that uses the sagemaker inference toolkit, and it works well for provisioned deployment. But it fails when I try to deploy it for serverless inference because of some errors in the default Sagemaker MMS web server (this for instance). So I was curious to hear whether you made any specific change to the huggingface toolkit to have it work OOTB with serverless
According to the documentation it is currently not possible to use custom registries/container for Serverless Inference: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html
They do mention that private registries are not supported, but they also specifically say that custom containers are supported (in the Container Support
section). And the serverless endpoint starts, so I think it's only really a code issue with the sagemaker-inference default webserver
Oh I actually found an answer there. Thanks for your answers!
Hey! I was looking at how you've built this inference toolkit to try and figure out how to couple using the
Multi-Model-Service
package with serverless inference. I've seen that you coded your ownstart_model_server
and your own service handler, I'd be super interested to hear whether any of the changes are related to using serverless inference endpoints. Thank you!