GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

exposing a scale resource and way to scale jobs #425

Open Padarn opened 3 years ago

Padarn commented 3 years ago

Replaces https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/394 and addresses https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/389

This PR is to break down the complexity of adding autoscaling: It only attempts to add a scaling mechanism that will work for standalone Jobs. This is because the operator doesn't otherwise know what jobs are running on the cluster, or how to scale them in/out so it doesn't make much sense to try and auto scale them.

The PR adds:

Note some limitations:

To Do

Padarn commented 3 years ago

@functicons I will work on the docs for this approach, but I think these are limitations we have to live with for now. Although it would be nice if it were possible to set max parallelism in the properties file.. I'll ask the flink mailing list about this.

functicons commented 3 years ago

/gcbrun

Padarn commented 3 years ago

Hey @functicons I've added some unit tests (and fixes while I wrote them), and some documentation. Let me know what you think.

Padarn commented 3 years ago

Any comments on the documentation so far? Too little, too much?

vvondruska commented 3 years ago

Hi. Thanks a lot for this pull request. We really like the scaling functionality and we are currently testing at with a plan to use it for our work. While testing the scaling feature we encountered a case where the operator crashed because of a nil pointer dereference - specifically here: controllers/flinkcluster_updater.go:392. It occurs when the operator tries to get a selector for task manager(s) of a terminating job. At that point the observed stateful set is nil and therefore its properties are no longer available. Is this crash considered a legitimate result of providing an incorrect or incomplete configuration or should it be fixed? Possible fixes include providing an empty label or parsing it from the recorded component. We chose to fix it by parsing the selector from the recorded component in case it's available. I can push the fix if you'd like.

Padarn commented 3 years ago

Hey @vvondruska thanks a lot for the feedback! Happy to have your fix, you could either push it or create a small snippet and I'll incorporate and push it. Either way works.

vvondruska commented 3 years ago

Thank you for a quick response @Padarn. I can push the fix. I was wondering if it would be OK with you if I pushed the code directly to the repository with the scaling functionality (Padarn:expose-scale). I would need a temporary write privilege to do that. Or would you prefer a different way to apply the fix?

Padarn commented 3 years ago

Makes sense to me. I sent you an invite to the repo so you can push.

google-cla[bot] commented 3 years ago

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

google-cla[bot] commented 3 years ago

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

google-cla[bot] commented 3 years ago

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

vvondruska commented 3 years ago

@googlebot I consent.

Padarn commented 3 years ago

@googlebot I consent.

mvbakker commented 3 years ago

@Padarn @functicons is there anything we (@vvondruska and I) can do to help to get this merged in?

Padarn commented 3 years ago

I'm happy to @mvbakker but cannot approve anything. Also happy to make any further changes if there is anything blocking.

Padarn commented 3 years ago

Hi @functicons any change we can get this merged?