awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Apache License 2.0
984 stars 230 forks source link

Preloading models on Sagemaker multi-model endpoint doesn't work #1001

Open sassarini-marco opened 1 year ago

sassarini-marco commented 1 year ago

Hi,

I'm trying to load some models at sagemaker endpoint server startup to make them already available on model prediction requests to skip the loading step phase on first request.

I've configured the mms with the following parameters accordingly to the mms documentation:

The model is a decompressed tar.gz archive generated through sagemaker training process plus a MAR-INF/MANIFEST.json directory with the model_name information.

From cloudwatch logs i see the model has been loaded correctly on a worker thread which immediatly stops after scale-down call.

Following some screen with the logs. The configuration:

image

The load-scale down:

image

I don't see errors in the logs: what's going on? Is it a bug?

Best regards.