Serverless inference using the Sagemaker toolkit

aws / sagemaker-huggingface-inference-toolkit

Apache License 2.0

240 stars 60 forks source link

Serverless inference using the Sagemaker toolkit #63

Closed arnaudstiegler closed 2 years ago

arnaudstiegler commented 2 years ago

Hey! I was looking at how you've built this inference toolkit to try and figure out how to couple using the Multi-Model-Service package with serverless inference. I've seen that you coded your own start_model_server and your own service handler, I'd be super interested to hear whether any of the changes are related to using serverless inference endpoints. Thank you!

philschmid commented 2 years ago

SageMaker Platform features like Serverless Inference are not directly built into the toolkit. For Multimodel it should be the same except that we make sure the if a inference.py is provided it is used.

arnaudstiegler commented 2 years ago

Thanks for the answer, let me maybe clarify my question: I have a custom container that uses the sagemaker inference toolkit, and it works well for provisioned deployment. But it fails when I try to deploy it for serverless inference because of some errors in the default Sagemaker MMS web server (this for instance). So I was curious to hear whether you made any specific change to the huggingface toolkit to have it work OOTB with serverless

philschmid commented 2 years ago

According to the documentation it is currently not possible to use custom registries/container for Serverless Inference: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

arnaudstiegler commented 2 years ago

They do mention that private registries are not supported, but they also specifically say that custom containers are supported (in the Container Support section). And the serverless endpoint starts, so I think it's only really a code issue with the sagemaker-inference default webserver

arnaudstiegler commented 2 years ago

Oh I actually found an answer there. Thanks for your answers!