actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.73k stars 1.12k forks source link

Support to pod security policies and node selectors #263

Open vittoriocanilli opened 3 years ago

vittoriocanilli commented 3 years ago

Hello,

I am implementing this solution on my cluster, where the following pod security policy is applied: https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/policy/restricted-psp.yaml

As described in the Kubernetes documentation, the policy is authorised with this cluster role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: psp:restricted
rules:
- apiGroups: ['extensions']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - restricted

and this cluster role binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default:restricted
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: psp:restricted
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

Besides the pod security policy, I have more than a node group and I would like to have the pods of the actions-runner-system namespace running in the non-production-nodegroup node group.

In order to have your controller-manager deployment creating a pod successfully, I had to add the following rows at the end of its definition:

...
      securityContext:
        runAsUser: 1000
        fsGroup: 1000
      nodeSelector:
        eks.amazonaws.com/nodegroup: non-production-nodegroup

Then I have created a runner deployment, which needed some tweaks for the pod security policy and the node selector:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeploy
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      repository: my-organisation/my-repository
      containers:
        - name: summerwind-runner
          image: summerwind/actions-runner
          env:
            - name: RUNNER_NAME
              value: summerwind-runner
            - name: RUNNER_REPO
              value: my-organisation/my-repository
            - name: RUNNER_TOKEN
              valueFrom:
                secretKeyRef:
                  name: controller-manager
                  key: github_token
          volumeMounts:
          - name: runner-externals
            mountPath: /runner/externals
          securityContext:
            privileged: false
            runAsUser: 1000
            fsGroup: 1000
      volumes:
      - name: runner-externals
        emptyDir: {}
      nodeSelector:
        eks.amazonaws.com/nodegroup: non-production-nodegroup

As you can see, I had to add explicitly the envs and the volumes to have the runner and its pod running in the cluster.

Unfortunately it is not enough to have the self-hosted runner in GitHub. In the runner's pod I can find the following in the logs:

sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
.path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Starting Runner listener with startup type: service
Started listener process
Started running service
An error occurred: Not configured
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.

The last 5 rows are repeated infinitely.

As far as I can understand, it tries to run the pod with root privileges, but the pod security context doesn't allow it.

Is there a way to have the self-hosted runners created from a Kubernetes cluster with pod security policies? The node selector does not seem to cause troubles, but I had to adapt your solution to use it.

It would be great to have support for both the security policies and the node selectors in the future Helm chart. Thanks in advance.

mumoshu commented 3 years ago

@vittoriocanilli Hey!

I believe this is something I'd like to have, too. But I also believe that the main blocker here is dind.

I can suggest you to try out https://docs.docker.com/engine/security/rootless/#prerequisites for our dind, so that the dind sidecar runs without the privileges.

Regarding the nosuid error in the runner container, I guess it can be easily fixed by removing sudo in our entrypoint.sh:

https://github.com/summerwind/actions-runner-controller/blob/ace95d72aba5156b6f402f54743466b3dd47d7a7/runner/entrypoint.sh#L53

while adding securityGroup.fsGroup so that the non-root, runner user within the runner container is able to run chown and mv in the entrypoint without privileges.

vittoriocanilli commented 3 years ago

@mumoshu Hey and thanks for your reply!

I am using now your Helm chart, so I can now install the controller-manager deployment running with compliance to our pod security policies by simply setting the proper values.

As you suggested, I have created a copy of your summerwind/actions-runner image in my container registry where I have removed sudo from the row 53 of the entrypoint.sh script, but unfortunately this is the only thing I see in the logs of my adapted runner deployment:

[dumb-init] /entrypoint.sh: Permission denied

It did not help that I have set these permissions to the file:

-rwxr-xr-x   1 runner docker 1805 Jan 26 14:02 entrypoint.sh

About the DIND side-container I guess that I don't need it. Please correct me if I am wrong, but it is needed only to run docker commands in the Github self-hosted runner. I actually just need the self-hosted runners to execute some kubectl commands. I suppose that I am not using the DIND side-container runner container defined in my previous comment. Is it possible to disable it via Helm or with the runner deployment YAML file?

After installing your Helm chart I have managed to have the controller-manager deployment running; I wonder now if the Helm chart can help me to set up runner deployments (to have actual self-hosted runners in Github), as I have noticed values for scaling of replicas and Github authentication. That would be great to have only a Helm chart installation to manage without a separate definition of the runner deployment(s).

Talking about the Github authentication, I have noticed that the entrypoint.sh script requires that the env variable for the personal access token is defined, but it does not check for env variables for authentication via Github app; how is this last authentication possibility supported?

Thanks in advance!

erikkn commented 3 years ago

@mumoshu how exactly would rootless solve the Privileged container issue? True, it does add additional security as we wouldn't have to run the container as root, but seccomp, AppArmor, and mount masks still have to be disabled (aka --privileged), right?

anandg112 commented 3 years ago

So there were multiple vulnerabilities found when scanning the helm chart for runner using Chekov (https://github.com/bridgecrewio/checkov). Some of which are:

Check: CKV_K8S_21: "The default namespace should not be used" Check: CKV_K8S_31: "Ensure that the seccomp profile is set to docker/default or runtime/default" Check: CKV_K8S_40: "Containers should run as a high UID to avoid host conflict" Check: CKV_K8S_23: "Minimize the admission of root containers" Check: CKV_K8S_37: "Minimize the admission of containers with capabilities assigned" Check: CKV_K8S_20: "Containers should not run with allowPrivilegeEscalation" Check: CKV_K8S_22: "Use read-only filesystem for containers where possible"

Will some of these be fixed so the chart can be used in a prod setting? Willing to contribute fixes to some of these mainly around seccomp profile etc.

mumoshu commented 3 years ago

@vittoriocanilli Sorry for the delayed repsonse!

About the DIND side-container I guess that I don't need it. Please correct me if I am wrong, but it is needed only to run docker commands in the Github self-hosted runner. I actually just need the self-hosted runners to execute some kubectl commands. I suppose that I am not using the DIND side-container runner container defined in my previous comment. Is it possible to disable it via Helm or with the runner deployment YAML file?

You can just set dockerEnabled: false, which results in the controller not adding the dockerd sidecar to runner pod.

mumoshu commented 3 years ago

I wonder now if the Helm chart can help me to set up runner deployments (to have actual self-hosted runners in Github), as I have noticed values for scaling of replicas and Github authentication. That would be great to have only a Helm chart installation to manage without a separate definition of the runner deployment(s).

Interesting idea! I usually use incubator/raw chart to manage arbitrary K8s resources with Helm. Would it work for you too? Do you still need a dedicated feature for managing runner deployments (and probably horizontal runner autoscalers) within the actions-runner-controller chart, or a dedicated chart for managing runner deployments?

mumoshu commented 3 years ago

how exactly would rootless solve the Privileged container issue?

@erikkn I have no real experience here 😉 But I'd try kaniko first when I need to container images without privilege and root containers.

erikkn commented 3 years ago

Kaniko only solves the issue of building images without needing root, right. Many people need the DinD stuff simply to run their workflow; unfortunate Actions is just lacking proper Kubernetes support.

mumoshu commented 3 years ago

@anandg112 Thanks for sharing! Yeah, contributions around seccomp would be more than welcomed.

The reported items mostly made sense to me, except..

Check: CKV_K8S_21: "The default namespace should not be used"

I thought everything was defined under the namespace specified via helm's -n flag by adding namespace: {{ .Release.Namespace}}. Did the tool show you the offending line number and/or the file name in the chart?

mumoshu commented 3 years ago

Check: CKV_K8S_22: "Use read-only filesystem for containers where possible"

Would this be a matter of adding the below to actions-runner-controller and runner pods spec?

        securityContext:
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /tmp
          name: tmp
mumoshu commented 3 years ago

Many people need the DinD stuff simply to run their workflow; unfortunate Actions is just lacking proper Kubernetes support.

@erikkn Yeah probably! Just curious, but how an ideal "proper Kubernets support" would look like in GitHub Actions, you think? I was thinking it very simple- just use kubectl run if your container doesn't necessarily run within the runner pod, or just use rootless docker.

erikkn commented 3 years ago

I don't know and at the same time I also don't want to give my opinion on this since it is a sensitive topic haha

RE using 'kubectl run', you mean that as a replacement of the DinD stuff? If so, would you mind to elaborate on this fix/approach?

I am currently running DinD rootless in a test cluster, will do my best to upload the Dockerfile to this repo this or next week.

mumoshu commented 3 years ago

RE using 'kubectl run', you mean that as a replacement of the DinD stuff? If so, would you mind to elaborate on this fix/approach?

Depends on your use-case I think. I occasionally have a need to run a containerized job that doesn't depend on local disk content so I can use either docker-run or kubectl-run to trigger the job. If you are to run e.g. a containerized golang build then you'd definitely need to send the build context from the local(runner container) disk to dockerd then you definitely need docker.

erikkn commented 3 years ago

I am not entirely sure if I understand the context of using 'kubectl run' . If my job uses a container (container: image: xyz) then 'kubectl run' is not gonna help you here, right, because Actions is going to want to run this job inside a container.

Edit:

That's actually what you are saying 😅 :

If you are to run e.g. a containerized golang build then you'd definitely need to send the build context from the local(runner container) disk to dockerd then you definitely need docker.

mumoshu commented 3 years ago

If my job uses a container (container: image: xyz) then 'kubectl run' is not gonna help you here, right, because Actions is going to want to run this job inside a container.

@erikkn Thanks! Yeah, I believe we're on the same boat and you're definitely correct.

If I add one more thing, my ultimate solution to security hardening for our setup is to avoid even using container and any Docker-based action (AFAIK any docker-based action results in docker-build on run).. until someday GitHub Actions adds a native K8s support for docker-based actions and container 😄

ikogan commented 3 years ago

We have a similar need for the securityContext to be exposed. Our PSP runs allowPrivilegeEscalation: true but defaultAllowPrivilegeEscalation: false so that our pods have to explicitly set that to allowPrivilegeEscalation: true in their security context. Unfortunately, since the securityContext is not exposed in the runner CRD, there's no way to set this.

Alternatively, the runner can just set both privileged: true and allowPrivilegeEscalation: true.

rahul-FLS commented 2 years ago

Check: CKV_K8S_22: "Use read-only filesystem for containers where possible"

Would this be a matter of adding the below to actions-runner-controller and runner pods spec?

        securityContext:
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /tmp
          name: tmp

Hello @mumoshu my managed cluster's PSP enforces readOnlyRootFilesystem: true which leads to throw below error by the runner pod. It starts fine if I test it locally by setting readOnlyRootFilesystem: false in the PSP. Any thoughts?

image
mumoshu commented 2 years ago

@rahul-FLS Hey. Yeah, it would just work if you didn't enforce that in your PSP.

mumoshu commented 2 years ago

@ikogan @vittoriocanilli We now have RunnerSet which is basically an extension of StatefulSet that allows you to fully customize the pod template, including securityContext, nodeSelectors, allowPrivilegeEscalation.

Not sure if it was possible to tweak privileged directly though. I thought we already had a dedicated issue for that.

rahul-FLS commented 2 years ago

@mumoshu I could able to resolve it by mounting a /tmp folder to runner container and found it writing below stuff to it which I believe it was trying to write in the root directory which is protected by PSP hence failed. Any idea why it tries to write to root?

image
mumoshu commented 2 years ago

@rahul-FLS Everything except /tmp within the pod would reside within the root device, by default, in K8s. I guess you'd need to add some emptyDir volumes to where the runner writes 🤔 At least you'd need to mount a emptyDir volume at /runner and /home/runner, but I'm not 100% sure. Try running df -h within the pod and you'd see which device is mounted where. If you need anything other than emptyDir, you can try RunnetSet with volumeClaimTemplates.

rahul-FLS commented 2 years ago

@mumoshu emptyDir volume was already mounted to /runner but I was getting the error. I mounted additional emptyDir to /tmp and it worked.

Next is to use Kaniko for container build because I can't use DinD as it needs root access. I am thinking to bake it inside your runner image because the Github provided kaniko action would need docker to fetch the kaniko binaries. Do you suggest this approach or shall I throw up a kaniko initContainer with your runner container and mount the executable to a shared volume?

mumoshu commented 2 years ago

@rahul-FLS I'd appreciate it if you could share what worked for you 😄 I have never tried to go docker less. One note is that you can disable privileged by setting dockerEnabled: false. That implies there's no dind so is docker.

rahul-FLS commented 2 years ago

@mumoshu here is the modified RunnerDeployment.yaml(look at the highlighted bits) and I have already got dockerEnabled: false so no docker :) So I need Kaniko to be hooked up in runner image .

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeploy
  labels:
    app: test
spec:
  replicas: 0
  template:
    spec:
      **volumes:
      - emptyDir: {}
        name: tmp**
      group: fls-poc
      serviceAccountName: test
      dockerEnabled: false
      securityContext:
          runAsUser: 1000
      ephemeral: true
      organization: orgName
      env:
      - name: RUNNER_FEATURE_FLAG_EPHEMERAL
        value: "true"
      - name: RUNNER_NAME
        value: $RANDOM
      **volumeMounts:
      - mountPath: /tmp
        name: tmp**
mumoshu commented 2 years ago

@rahul-FLS Thanks for sharing! It's now more clear what you're doing. Not sure if it works well with runAsUser permission-wide, I'd suggest you try init containers to install various binaries like kaniko first. That way, you have the potential to not be forced to rebuild container image so often just to add a few binaries. If I were you, I'd try cp and chmoding binaries from an init container to a volume shared between the init container and the runner container. But it may not work if the kaniko image lacks any standard shell.

rahul-FLS commented 2 years ago

Thanks @mumoshu it looks like kaniko doesn’t support copying over binaries to another container https://github.com/GoogleContainerTools/kaniko#known-issues

rahul-FLS commented 2 years ago

Would be good if controller could support build tools other than docker too which don’t need privilege access as its not advisable by security folks 😊

mumoshu commented 2 years ago

@rahul-FLS I can't agree more from the user's perspective! Implementation-wise, it won't be too hard to provide a set of docs, examples, and another Dockerfile for the non-privileged use-case without sysbox. With sysbox- I remember that I discussed with other folks about potential support for sysbox(it's another container runtime that is able to run dind within non-privileged runner pod). That would be another option for your use-case.

The primary blocker is that providing free user support for an open-source project with hundreds of possible settings is too hard for a team of two maintainers. If you're willing to contribute anything to the project or sponsor as a company, that would greatly help ARC broaden its scope earlier!

mumoshu commented 2 years ago

@rahul-FLS Also, probably you'd better open a dedicated issue for your use-case. I think what you're trying to is going beyond what's explained by @vittoriocanilli

ajitkumarnayak1976 commented 2 years ago

Hi, While runner the runner deployment, pods are getting error'ed out with below message " policy 'disallow-privileged-containers' (Validation) rule 'privileged-containers' failed. validation error: Privileged mode is disallowed. The fields spec.containers[].securityContext.privileged and spec.initContainers[].securityContext.privileged must be unset or set to false. Rule privileged-containers failed at path /spec/containers/1/securityContext/privileged/

Any idea how to fix this issue.

mumoshu commented 2 years ago

@ajitkumarnayak1976 Hey. Set privileged: false or remove the offending policy from your cluster. You seem to be using kyverno so you'd need to consult how kyverno works. It seems there's nothing specific to ARC in your issue.

harsh-p-soni commented 1 year ago

@rahul-FLS / @mumoshu Any possible workaround that you can share to build image using kaniko in runner?