Closed Yancey1989 closed 7 years ago
Need to let HPA to scale jobs.
Can we convert trainers to the type of Deployment or ReplicaSet instead?
We can scale jobs (jobs in the general sense, not the Kubernetes Job) by:
Both approaches requires scaling metrics.
HPA support per Pod CPU and memory metrics out-of-the-box. Custom metrics can be collected with some effort.
The CPU and memory usage is probably not a good indicator for scaling trainers, since each trainer's memory and CPU usage is almost constant since the training process is a periodic iterative calculation process.
We need to scale inference jobs based on the load for each Pod that runs the inference server. Query Per Second (QPS) is a good indicator.
The auto-scaling for training jobs should be based on the "cluster resource usage" (i.e., number of GPUs that can be elastically scaled, the resource-requirement-pressure of other job types such as inference job).
Currently HPA does not support Kubernetes Job that the trainer users, and the scaling API changed since Kubernetes 1.6. Perhaps currently the best thing to do is to scale manually by a custom server.
After some research, maybe one good way of abstracting the training/inferencing job is to create a custom resource (Job and Deployment are resources) and a custom controller for that resource.
The custom resource specifies the training/inferencing configuration (e.g., how may GPU trainer minimum/maximum, how many pservers minimum/maximum). The minimum/maximum values are for auto-scaling.
The custom controller coordinates all training/inferencing jobs. It knows how many GPU nodes are available, so it can dynamically scale all training/inferencing jobs accordingly.
References: https://resources.coreos.com/youtube-coreos-fest-2017/writing-a-custom-controller-extending-the-functionality-of-your-cluster https://coreos.com/blog/introducing-operators.html https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md https://coreos.com/blog/custom-resource-kubernetes-v17
Design Doc: Horizontal Autoscaling: https://github.com/PaddlePaddle/cloud/pull/380
Autoscaling Trainer job on PaddleCloud
Background
A Paddle training job contains several trainer instances(Kubernetes Job), several parameter server instances((Kubernetes ReplicaSet) and a master process(only fault-tolerant mode, Kubernetes ReplicaSet). We hope PaddleCloud has the ability which can autoscaling the count of trainer instances. This issue would like to discuss how to implement this feature.
HPA on Kubernetes
With Horizontal Pod Autoscaling(HPA), Kubernetes is able to scale the number of Pod automatically, users will use HPA as:
Which:
min
is the low limit for the number of pods.max
is the high limit for the number of pods.--cpu-percent
is the target average CPU utilization.Fetch metrics
From Kubernetes Doc:
Heapster access [heapster]https://github.com/kubernetes/heapster enables Container Cluster Monitoring and Performance Analysis, and use InfluxDB as the backend storage.
REST client access
Problem
From now on, the HPA only support ReplicaSet and Deployment, but the trainer is a Job in Kubernets.
Possible solutions
Fix HPA to support Job Resource The Job in Kubernetes supports
scale
, so maybe we can extend HPA to support Job, and I think it's a better way.Custom server to scale Job instance We can also develop another simple service to check the metrics for a sync period and call
scale
API to scale the Trainer instance.