clastix / cluster-api-control-plane-provider-kamaji

The Kamaji Control Plane provider implementation of the Cluster Management API
Apache License 2.0
74 stars 25 forks source link

fix(tilt): force user in tilt Dockerfile #89

Closed bengentil closed 5 months ago

bengentil commented 5 months ago

When running CACPPK with cluster-api Tiltfile, I get this following error:

[event: pod kamaji-system/capi-kamaji-controller-manager-54976585c9-mqrt9] Error: container has runAsNonRoot and image will run as root (pod: "capi-kamaji-controller-manager-54976585c9-mqrt9_kamaji-system(3b84c5c0-3e3b-453e-9dd2-4418c0bfaaf7)", container: manager)

This is because with cluster-api Tiltfile, tilt is generating a Dockerfile and it's not using the one in cluster-api-control-plane-provider-kamaji.

This MR fix this issue by adding USER directive to the generated Dockerfile

jds9090 commented 5 months ago

@bengentil Hello! How can I find the log? I have no issue with the tilt-provider.json. 스크린샷 2024-03-27 오후 7 12 48

jds9090 commented 5 months ago

I think CAPI's Tiltfile does not use each provider's Dockerfile.

Below is the log about CAPO from Tilt

STEP 1/3 — Building Dockerfile: [gcr.io/k8s-staging-capi-openstack/capi-openstack-controller]
Building Dockerfile for platform linux/amd64:

  # Tilt image
  FROM golang:1.20.10 as tilt-helper
  # Install delve. Note this should be kept in step with the Go release minor version.
  RUN go install github.com/go-delve/delve/cmd/dlv@v1.20
  # Support live reloading with Tilt
  RUN wget --output-document /restart.sh --quiet https://raw.githubusercontent.com/tilt-dev/rerun-process-wrapper/master/restart.sh  &&     wget --output-document /start.sh --quiet https://raw.githubusercontent.com/tilt-dev/rerun-process-wrapper/master/start.sh &&     chmod +x /start.sh && chmod +x /restart.sh && chmod +x /go/bin/dlv &&     touch /process.txt && chmod 0777 /process.txt `# pre-create PID file to allow even non-root users to run the image`

  FROM golang:1.20.10 as tilt
  WORKDIR /
  COPY --from=tilt-helper /process.txt .
  COPY --from=tilt-helper /start.sh .
  COPY --from=tilt-helper /restart.sh .
  COPY --from=tilt-helper /go/bin/dlv .
  COPY $binary_name .

     Building image
     [background] read source files 61.93MB [done: 746ms]
     [tilt-helper 1/3] FROM docker.io/library/golang:1.20.10@sha256:8a3e8d1d3a513c0155451c522d381e901837610296f5a077b19f3d350b3a1585
     [tilt 5/7] COPY --from=tilt-helper /go/bin/dlv . [cached]
     [tilt 4/7] COPY --from=tilt-helper /restart.sh . [cached]
     [tilt 3/7] COPY --from=tilt-helper /start.sh . [cached]
     [tilt 2/7] COPY --from=tilt-helper /process.txt . [cached]
     [tilt-helper 3/3] RUN wget --output-document /restart.sh --quiet https://raw.githubusercontent.com/tilt-dev/rerun-process-wrapper/master/restart.sh  &&     wget --output-document /start.sh --quiet https://raw.githubusercontent.com/tilt-dev/rerun-process-wrapper/master/start.sh &&     chmod +x /start.sh && chmod +x /restart.sh && chmod +x /go/bin/dlv &&     touch /process.txt && chmod 0777 /process.txt `# pre-create PID file to allow even non-root users to run the image` [cached]
     [tilt-helper 2/3] RUN go install github.com/go-delve/delve/cmd/dlv@v1.20 [cached]
     [tilt 6/7] COPY  . [cached]
     exporting to image

STEP 2/3 — Pushing localhost:5000/gcr.io_k8s-staging-capi-openstack_capi-openstack-controller:tilt-bd30b0a665bb99a7
     Pushing with Docker client
     Authenticating to image repo: localhost:5000
     Sending image data
     cf024edacade: Layer already exists 
     caa21108d063: Layer already exists 
     f4d12624a593: Layer already exists 
     9e80367d5db9: Layer already exists 
     a66766514c56: Layer already exists 
     a3b8c5d667f9: Layer already exists 
     1220da66aebd: Layer already exists 
     dfe25755ef07: Layer already exists 
     29e49b59edda: Layer already exists 
     266def75d28e: Layer already exists 
     1777ac7d307b: Layer already exists 
bengentil commented 5 months ago

Hi @jds9090

This log is a kubernetes event, it can be seen in the cacppk_controller resource in tilt ui/logs, by kubectl describe pod -n kamaji-system <pod> the pod stuck in CreateContainerConfigError or by kubectl get events -n kamaji-system

I think CAPI's Tiltfile does not use each provider's Dockerfile.

yes that's why I'm adding the command in additional_docker_build_commands which is merged in the generated Dockerfile

By looking at the above screenshot, you don't reproduce (2/2 in CACPPK), I think this may be due to a more permissive kubelet config somehow: https://github.com/kubernetes/kubernetes/blob/d098af353c63ed6cfc5d69a0749e7c444195d41c/pkg/kubelet/kuberuntime/security_context_others.go#L47

jds9090 commented 5 months ago

Hi @jds9090

This log is a kubernetes event, it can be seen in the cacppk_controller resource in tilt ui/logs, by kubectl describe pod -n kamaji-system <pod> the pod stuck in CreateContainerConfigError or by kubectl get events -n kamaji-system

I think CAPI's Tiltfile does not use each provider's Dockerfile.

yes that's why I'm adding the command in additional_docker_build_commands which is merged in the generated Dockerfile

By looking at the above screenshot, you don't reproduce (2/2 in CACPPK), I think this may be due to a more permissive kubelet config somehow: https://github.com/kubernetes/kubernetes/blob/d098af353c63ed6cfc5d69a0749e7c444195d41c/pkg/kubelet/kuberuntime/security_context_others.go#L47

Could you provide full logs and more information about the test environment such as tilt-settings.json in CAPI.

bengentil commented 5 months ago

I'm not sure what additional log I can provide, this basically the only error I have and it makes sense as:

My test environment is a 1.28.3 cluster deployed using cluster-api (kubeadm for bootstrap and control-plane, openstack for infra) with ubuntu 22.04.3 images, here my tilt-settings.yaml:

default_registry: <internal_registry>
provider_repos:
- ../cluster-api-provider-openstack
- ../../clastix/cluster-api-control-plane-provider-kamaji
enable_providers:
- kubeadm-bootstrap
- openstack
- kamaji
kustomize_substitutions:
  EXP_KUBEADM_BOOTSTRAP_FORMAT_IGNITION: "true"
allowed_contexts:
- mgmt-cluster-admin@mgmt-cluster
jds9090 commented 5 months ago

Could you share the capi-kamaji-controller-manager manifest file?

I think Tilt modifies the deployment for some reasons such as debugging. You can find the manifest files under cluster-api/.tiltbuild/yaml/.

There is no SecurityContext in my deployment. Below is what my tilt deployed

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"manager","app.kubernetes.io/created-by":"cluster-api-control-plane-provider-kamaji","app.kubernetes.io/instance":"controller-manager","app.kubernetes.io/managed-by":"tilt","app.kubernetes.io/name":"deployment","app.kubernetes.io/part-of":"cluster-api-control-plane-provider-kamaji","cluster.x-k8s.io/provider":"kamaji","clusterctl.cluster.x-k8s.io":"","control-plane":"controller-manager"},"name":"capi-kamaji-controller-manager","namespace":"kamaji-system"},"spec":{"replicas":1,"selector":{"matchLabels":{"cluster.x-k8s.io/provider":"kamaji","control-plane":"controller-manager"}},"strategy":{},"template":{"metadata":{"annotations":{"kubectl.kubernetes.io/default-container":"manager"},"labels":{"app.kubernetes.io/managed-by":"tilt","cluster.x-k8s.io/provider":"kamaji","control-plane":"controller-manager","tilt.dev/pod-template-hash":"8006aeda062ca9c906f6"}},"spec":{"containers":[{"args":["--leader-elect"],"command":["sh","/start.sh","/manager"],"image":"localhost:5000/clastix_cluster-api-control-plane-provider-kamaji:tilt-c357b06bf9eeaea8","imagePullPolicy":"IfNotPresent","livenessProbe":{"httpGet":{"path":"/healthz","port":8081},"initialDelaySeconds":15,"periodSeconds":20},"name":"manager","ports":[{"containerPort":8080,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"httpGet":{"path":"/readyz","port":8081},"initialDelaySeconds":5,"periodSeconds":10},"resources":{"limits":{"cpu":"500m","memory":"128Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}}],"serviceAccountName":"capi-kamaji-controller-manager","terminationGracePeriodSeconds":10}}}}
  creationTimestamp: "2024-03-19T01:36:51Z"
  generation: 1
  labels:
    app.kubernetes.io/component: manager
    app.kubernetes.io/created-by: cluster-api-control-plane-provider-kamaji
    app.kubernetes.io/instance: controller-manager
    app.kubernetes.io/managed-by: tilt
    app.kubernetes.io/name: deployment
    app.kubernetes.io/part-of: cluster-api-control-plane-provider-kamaji
    cluster.x-k8s.io/provider: kamaji
    clusterctl.cluster.x-k8s.io: ""
    control-plane: controller-manager
  name: capi-kamaji-controller-manager
  namespace: kamaji-system
  resourceVersion: "1352"
  uid: 91222e42-ce08-49fa-800d-c17d55c22c05
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      cluster.x-k8s.io/provider: kamaji
      control-plane: controller-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: manager
      creationTimestamp: null
      labels:
        app.kubernetes.io/managed-by: tilt
        cluster.x-k8s.io/provider: kamaji
        control-plane: controller-manager
        tilt.dev/pod-template-hash: 8006aeda062ca9c906f6
    spec:
      containers:
      - args:
        - --leader-elect
        command:
        - sh
        - /start.sh
        - /manager
        image: localhost:5000/clastix_cluster-api-control-plane-provider-kamaji:tilt-c357b06bf9eeaea8
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 1
        name: manager
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 500m
            memory: 128Mi
          requests:
            cpu: 10m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: capi-kamaji-controller-manager
      serviceAccountName: capi-kamaji-controller-manager
      terminationGracePeriodSeconds: 10
bengentil commented 5 months ago

The manifest doesn't have indeed the SecurityContext.

I re-created my test env and everything is fine, I think there was a deployment from a previous helm-installed CACPPK. As tilt-prepare remove the SecurityContext (no value instead of empty value {}) , if there is a previous value it will be merged and not replaced.

My bad, thank you for your help!