SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.4k stars 832 forks source link

seldon core v2: improved autoscaling #5096

Open kevinnowland opened 1 year ago

kevinnowland commented 1 year ago

slack conversation

What is the behavior of seldon core v2 in the following scenario?

Related feature requests:

  1. Have models (not servers) scale up and down based on a single metric. Inactivity for scaling down in particular seems inappropriate if one receives a constant baseline number of requests per second. I might be misunderstanding how the scaling works and this is possible. I am also assuming requests are dispersed evenly between models, perhaps the strategy is to saturate a model before hitting a second version so even in my scenario one model will be inactive.
  2. The ability to scale servers up and down based on number of models needed. This is maybe impossible while the mapping between the two is done based on server capabilities and model requirements.
  3. The ability to deploy models to specific servers, i.e., reference the named k8s object. We do this by giving servers very specific names that we mirror in their capabilities. This is perhaps not in line with seldon core v2 philosophy. Or is maybe possible and I'm ignorant as to how to do it?

Thanks for your help!

ukclivecox commented 1 year ago

Maybe one solution is to allow model autoscaling to be switched off, plus the ability for models to be defined to be locked to all server replicas so if the server autoscales all models are added to it. Scale down scenarios are also handled this way. So essentially that is delegating auto-scaling to server HPA/KEDA and is more akin to Seldon Core V1 except multi-models can also be scaled this way? New model joiners would need to be added to all replicas. @sakoush

Rajakavitha1 commented 2 months ago

this is now addressed in : https://github.com/SeldonIO/seldon-core/pull/5935