Open saeid93 opened 1 year ago
is there any update on this? this seems to be quite relevant when loading LLMs since the start time is really high
Hey @jondeandres ,
The above only affects to the start time of the sidecar executor container - so it should be a "constant" delay (and shouldn't scale with the size of the model).
Describe the bug
As per the community Slack discussion with @edshee, Seldon deployment with lightweight containers will take up to 25s to be ready for serving. Based on our investigation in the same Slack thread and @adriangonz suggestion we found out the problem was that there's a (too) long delay time before the readiness/liveness probes start to give time for both model containers and Seldon containers to start. For model containers, this can be easily solved by modifying the values of Seldon model containers. However, the same values are not exposed for the
seldon-container-engine
and any new deployment or update to theSeldonDeployment
resource is taking +20s even for lightweight model containers since the values for readiness/liveness probe of seldon-container-engine are hardcoded and no matter how fast the model containers start they will be always bottlenecked by theseldon-container-engine
. The bottleneck is theInitialDelaySeconds
setting on the seldon-container-engine container which is set to 20 seconds by default. Reducing that will help us to startupseldon-container-engine
in a shorter time just like model containers.To reproduce
I have provided a gist of all scenarios. Docker images are on public registries so yaml file can be deployed anywhere. Models and build scripts are also included.
Expected behaviour
As @cliveseldon suggested Liveness and Readiness values should be more controllable or investigated why we can't make it shorter or zero. Maybe there is a logic bug here as the engine should only become ready when the graph is ready so not sure why the initial delay seconds is so high as it should be reactive to the graph components coming up. Liveness of the engine should also be immediate and before it's ready. We just need to ensure models that take mins are also handled ok.
Environment