GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

How to set serviceAccount to pods by operator #329

Open yanghui16355 opened 3 years ago

yanghui16355 commented 3 years ago

I am running my flink application in AWS EKS and need to set the serviceAccount(with AWS IAM role attached) to my flink pods. Based on current CRD, I don't see there is an option to pass the serviceAccount name to my flink pod. Can you point to me which option I can use to set serviceAccount or add this option if does't support yet.

Thanks,

Hui

functicons commented 3 years ago

Currently, the CRD doesn't support specifying serviceAccount directly, it uses the default service account of the Flink cluster CR's namespace. A workaround is that you create/choose a custom namespace for your Flink application and grant necessary permissions to the default service account of the namespace:

apiVersion: flinkoperator.k8s.io/v1beta1
kind: FlinkCluster
metadata:
  name: flinkjobcluster-sample
  namespace: my-namespace
...

But I think it is a good FR to support custom serviceAccount for Flink JM/TM Pods. If you want to contribute the PR, I will review it.

yanghui16355 commented 3 years ago

@functicons can you guide me how to add the support for serviceAccount or you can help to add it which would be quicker? thanks

functicons commented 3 years ago

It is pretty straightforward. Essentially, you need to add the new field in the FlinkClusterSpec; convert the field into the field in Pod spec in the converter.

Here is an example PR https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/321.

yanghui16355 commented 3 years ago

@functicons thanks for your reply and I made a PR for the change. However, I am not able to push my commit by the following error: ERROR: Permission to GoogleCloudPlatform/flink-on-k8s-operator.git denied to yanghui16355. fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

I use the ssh and can pull the remote master to my local, but not able to push my change from local to remote branch. Do I need to do anything more to allow the push to your repo?

Thanks,

Hui

functicons commented 3 years ago

You cannot directly push to https://github.com/functicons/flink-on-k8s-operator. Instead, you need to first fork the repo to your own project, e.g., this is mine https://github.com/functicons/flink-on-k8s-operator; then clone the fork to your local machine; your commit should first be pushed to your fork; finally, create a PR from your fork. See this doc for more details https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request.

yanghui16355 commented 3 years ago

@functicons I have submitted my PR here https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/334, please take a review, thanks!

yanghui16355 commented 3 years ago

@functicons I have addressed all your comments, please review again https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/334, thanks!

yanghui16355 commented 3 years ago

@functicons one more question, how to upgrade the flink-operator with the new version?

yanghui16355 commented 3 years ago

@functicons I got an issue after my PR been merged as following: After I run "make deploy" command, I found the file "/config/crd/bases/flinkoperator.k8s.io_flinkclusters.yaml" is changed and the "serviceAccountName" I added in my PR would been removed. Therefore, the deployed operator still doesn't have the serviceAccountName defined in its CRD. Do you know if I am missing anything to generate the right operator CRD definition? Can you help to try to deploy on your local to see the issue? thanks

ckdarby commented 3 years ago

Therefore, the deployed operator still doesn't have the serviceAccountName defined in its CRD. Do you know if I am missing anything to generate the right operator CRD definition? Can you help to try to deploy on your local to see the issue? thanks

@functicons We're facing the same issue as @yanghui16355

ckdarby commented 3 years ago

@functicons We're unable to see where the specific bug is, but it looks to be that the service account is actually being set to empty which causes the default service account to be used which ironically happens to be called default in Kubernetes.

The current tests pass in a service account name called "default" which doesn't actually test the true functionality because if it isn't set correctly, Kubernetes will apply default as the service account.

yanghui16355 commented 3 years ago

@ckdarby I tried build the docker image on my local and use built image to install the operator from local works. What I did is remove the step "manifest" in Makefile and then it won't delete the serviceAccountName in the CRD. However, this is something need to fix but I am not sure how to do. cc @functicons

functicons commented 3 years ago

We just released a new image which includes the PR #334:

$ gcloud container images list-tags gcr.io/flink-operator/flink-operator

DIGEST        TAGS              TIMESTAMP
2cc84a772dd3  latest,v1beta1-8  2020-12-03T17:25:39
functicons commented 3 years ago

What I did is remove the step "manifest" in Makefile and then it won't delete the serviceAccountName in the CRD.

Does this problem still exist? I don't quite understand what you mean by that. @yanghui16355

yanghui16355 commented 3 years ago

@functicons when I build the image on my local by "make operator-image" or deploy the operator by "make deploy", it will change the config/crd/bases/flinkoperator.k8s.io_flinkclusters.yaml and remove the serviceAccountName I added in my PR.

I found it is due to the "manifest" phase in Makefile to update it. Ideally, "manifest" phase should generate latest CRD since it is generated from converter which has all the latest change. However, I am not sure why it doesn't work that I have to remove this phase to make it not change the CRD.