SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.38k stars 831 forks source link

Long start time of Seldon container engine #4740

Open saeid93 opened 1 year ago

saeid93 commented 1 year ago

Describe the bug

As per the community Slack discussion with @edshee, Seldon deployment with lightweight containers will take up to 25s to be ready for serving. Based on our investigation in the same Slack thread and @adriangonz suggestion we found out the problem was that there's a (too) long delay time before the readiness/liveness probes start to give time for both model containers and Seldon containers to start. For model containers, this can be easily solved by modifying the values of Seldon model containers. However, the same values are not exposed for the seldon-container-engine and any new deployment or update to the SeldonDeployment resource is taking +20s even for lightweight model containers since the values for readiness/liveness probe of seldon-container-engine are hardcoded and no matter how fast the model containers start they will be always bottlenecked by the seldon-container-engine. The bottleneck is the InitialDelaySeconds setting on the seldon-container-engine container which is set to 20 seconds by default. Reducing that will help us to startup seldon-container-engine in a shorter time just like model containers.

To reproduce

I have provided a gist of all scenarios. Docker images are on public registries so yaml file can be deployed anywhere. Models and build scripts are also included.

Expected behaviour

As @cliveseldon suggested Liveness and Readiness values should be more controllable or investigated why we can't make it shorter or zero. Maybe there is a logic bug here as the engine should only become ready when the graph is ready so not sure why the initial delay seconds is so high as it should be reactive to the graph components coming up. Liveness of the engine should also be immediate and before it's ready. We just need to ensure models that take mins are also handled ok.

Environment

jondeandres commented 1 year ago

is there any update on this? this seems to be quite relevant when loading LLMs since the start time is really high

adriangonz commented 1 year ago

Hey @jondeandres ,

The above only affects to the start time of the sidecar executor container - so it should be a "constant" delay (and shouldn't scale with the size of the model).