Open TsvetanKonstantinov opened 4 years ago
Below are some notes on how to support the k8s Horizontal Pod Autoscaler by adding the /scale subresource to the FlinkCluster CRD:
StatusReplicasPath should point to a new status/taskManager/replicas property of the CRD that contains the current (actual) number of replicas of the TM deployment. Currently, this field is not present in the CRD status section.
LabelSelectorPath should point to the pods of the TM deployment. This way, the autoscaler will watch the CPU utilization of the TM pods and update the spec/taskManager/replicas field of the CRD accordingly. This will trigger an update of the cluster/job with the new TM replicas count (and possibly the new job parallelism that is equal to the number of replicas).
Sounds like a great feature to add. IIUC, it leverages the update cluster feature we added recently when the autoscaler decides TM should be scaled.
But I have a concern about the compatibility with Flink's native k8s integration. Eventually this operator will migrate to that model. Does HPA still make sense with that?
Sounds like a great feature to add. IIUC, it leverages the update cluster feature we added recently when the autoscaler decides TM should be scaled.
Yes, it leverages the update cluster feature.
But I have a concern about the compatibility with Flink's native k8s integration. Eventually, this operator will migrate to that model. Does HPA still make sense with that?
The concern is valid.
Do you mean this (Flink reactive mode)? When implemented, Flink will be able to react to newly started TaskManagers automatically. Theoretically, once Flink delivers this feature, you should be able to add an HPA to the TaskManager deployment.
It would still make more sense to add the HPA to the FlinkCluster custom resource instead. The only change of the logic would be that the operator should not trigger an update if "reactive" mode is enabled but simply start/stop TM replicas and let Flink do the work.
Having the scale sub-resource on the CRD will open the door for additional features. E.g. it is not clear how "reactive mode" will behave for Session Clusters. The operator can support auto-scaling only some jobs that are running on a session cluster based on their resource utilization.
Sounds good. Go ahead adding this feature, I will review your PR once it is ready.
Is there any update for this improvement?
TaskManager Autoscaling is an important feature for production Flink workloads. Currently, I can't create a Horizontal Pod Autoscaler for the FlinkCluster custom resource. K8s provides support for autoscaling custom resources via the /scale subresource.