Autoscaling enhancement

GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.

Apache License 2.0

658 stars 265 forks source link

Autoscaling enhancement #389

Open Padarn opened 3 years ago

Padarn commented 3 years ago

Hi all,

I'm interested in autoscaling support for the operator. You can see that ververica platform supports this now, the mechanism I gather from their post is:

Monitor CPU
If scale-up required, stop job with savepoint
Scale cluster
restore from savepoint

With perhaps 2 and 3 swapped to ensure smaller downtime.

I'd be interested in adding this if it is something others would use. Any thoughts on the idea? It seems pretty straightforward to attach an HPA and use it's 'target replicas' to modify the desired number of taskmanagers.

shashken commented 3 years ago

Hey @Padarn , Regarding point 2,3 I'm preparing a small PR that can tell the operator to take savepoint before cluster restart on upgrade (this is good regardless of the automatic part). Regarding the autoscaling capability, this might be a nice idea, but this can also be a separete component (that communicates with the operator) flink scaling is a little bit complicated, and an approach that scales up a cluster based on cpu metrics alone can have no impact or even negative impact on some clusters.

Padarn commented 3 years ago

Hey @shashken, thanks for your response. You make a good point, it can certainly be a separate component, this would be much cleaner.

Let me know when you have a PR ready, would be keen to review to get more exposure to the operator layout.

shashken commented 3 years ago

@Padarn Done - https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/392 ShouldTakeSavepointOnUpgrade is the flag I added. that was actually the smallest change in the PR, I fixed another savepoint issue to increase the savepoint feature stability

functicons commented 3 years ago

@Padarn, thanks for the proposal. I like the idea of adding auto scaling as a feature of the operator, it should be just declarative for end users, a separate component would be more complex for them to use. It would be nice if you can contribute a PR, thanks!

Padarn commented 3 years ago

thanks @shashken

@functicons would be happy to work on a PR. Perhaps as there are some differing opinions on this, I will first create a POC version and ask for a review. If everyone is aligned I can clean it up to be merged.

I'll take a go at this in the weekend.

Padarn commented 3 years ago

Hi guys. I looked into this a bit and I see there as being two options.

We expose a scale subresource which will allow HPA to be used along side the existing operator. It is possible that the HPA could become part of the components deployed too, but I hadn't quite figured out how to do that yet. I made a MR (WIP) to do this https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/394
We implement custom autoscaling logic that watches the metrics-server (what HPA does) and scale based on that.

The second option seems easy enough, but it does mean reimplementing a lot of functionality that already exists in the HPA. So if it were possible to use the HPA I think that might be better.

Open to any thoughts.

functicons commented 3 years ago

Haven't reviewed your PR, but I prefer option 1, if we can reuse HPA, it will be easier to maintain.

Padarn commented 3 years ago

Yeah I tend to agree. I will try adding an HPA to the resources to the resources run by the operator. I need to look at some other examples of operators to see how they handle optional resources.

Padarn commented 3 years ago

Hi @functicons I've updated the PR to create an HPA along with the operator itself. The MR is not fully tested yet, but as an example of how it would look given our discussion above.

To give some detail on how the process would work:

CRD has an optional field which allows specifying an HPA spec (very similar to normal HPA but we don't expose the scaleTargetRef as this is the TaskManager stateful set of our operator.
Operator will (if specified create this HPA)
HPA is given a reference to the FlinkCluster from which it should use the new /scale subresource to set the cluster spec (note that this means scaling the CRD spec, so the same reconcile loop for FlinkCluster updates is followed.

Notes:

Based on autoscaling/v1, not the newer autoscaling/v2beta1. Can update this if the overall approach is agreed on.