GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.61k stars 1.42k forks source link

build fails with filesystem permission error: failed to write "security.capability" attribute #2201

Open andreas-ibm opened 2 years ago

andreas-ibm commented 2 years ago

Actual behavior Error during kaniko executor:

INFO[0000] Retrieving image manifest quay.io/strimzi/kafka:0.29.0-kafka-3.1.0
INFO[0000] Retrieving image quay.io/strimzi/kafka:0.29.0-kafka-3.1.0 from registry quay.io
INFO[0001] Built cross stage deps: map[]
INFO[0001] Retrieving image manifest quay.io/strimzi/kafka:0.29.0-kafka-3.1.0
INFO[0001] Returning cached image manifest
INFO[0001] Executing 0 build triggers
INFO[0001] Unpacking rootfs as cmd RUN 'mkdir' '-p' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2'       && 'curl' '-f' '-L' '--output' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' 'https://repo1.maven.org/maven2/io/debezium/debezium-connector-oracle/1.9.5.Final/debezium-connector-oracle-1.9.5.Final-plugin.tar.gz'       && 'tar' 'xvfz' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' '-C' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2'       && 'rm' '-vf' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' requires it.
error building image: error building stage: failed to get filesystem from image: failed to write "security.capability" attribute to "/usr/bin/newgidmap": operation not permitted

Expected behavior Unpacking to succeed

To Reproduce Steps to reproduce the behavior:

  1. Create kubernetes stack with:
    • 4x Ubuntu 20.04.4 LTS nodes; 3 of which are worker nodes
    • CRI-O 1.23
    • Kubernetes 1.23
    • Ceph
    • Strimzi 0.29.0
  2. Create basic cluster as described in https://strimzi.io/quickstarts/
    • kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
    • kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka
  3. Deploy yaml:
    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaConnect
    metadata:
    name: debezium-oracle-cluster
    namespace: kafka
    annotations:
    strimzi.io/use-connector-resources: "true"
    spec:
    version: 3.1.0
    replicas: 1
    bootstrapServers: my-cluster-kafka-bootstrap:9092
    config:
    config.providers: secrets
    config.providers.secrets.class: io.strimzi.kafka.KubernetesSecretConfigProvider
    group.id: connect-cluster
    offset.storage.topic: connect-cluster-offsets
    config.storage.topic: connect-cluster-configs
    status.storage.topic: connect-cluster-status
    # -1 means it will use the default replication factor configured in the broker
    config.storage.replication.factor: -1
    offset.storage.replication.factor: -1
    status.storage.replication.factor: -1
    build:
    output:
      type: docker
      image: uk.icr.io/debezium/debezium-connect-oracle:latest
      #additionalKanikoOptions: [--insecure]
      pushSecret: ibmcloud-credentials
    plugins:
      - name: debezium-oracle-connector
        artifacts:
          - type: tgz
            url: https://repo1.maven.org/maven2/io/debezium/debezium-connector-oracle/1.9.5.Final/debezium-connector-oracle-1.9.5.Final-plugin.tar.gz
    externalConfiguration:
    volumes:
      - name: connector-config
        secret:
          secretName: oracle-credentials

    (yes, you'd technically need an Oracle container in there too, but it didn't get that far so it's not really needed to reproduce)

  4. Get logs of build: kc logs debezium-oracle-cluster-connect-build -f for the above output.

Additional Information

I suspect this is down to me using CRI-O, but happy to be proven wrong. I was able to build an equivalent image (I think) using podman. I hadn't heard of Kaniko till I hit this issue so I'm still getting up to speed, I suspect to create a standalone reproduction I'd need to replicate the Dockerfile and then find a way to invoke Kaniko using a yaml to plain Kube. This is related to an Issue i raised with Strimzi earlier today: https://github.com/strimzi/strimzi-kafka-operator/discussions/7179 (I wasn't sure if the behaviour was down to how Strimzi was invoking Kaniko or not)

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [x]
Please check if this error is seen when you use --cache flag
  • - [ ]
Please check if your dockerfile is a multistage dockerfile
  • - [ ]
andreas-ibm commented 2 years ago

got the Dockerfile:

    ##############################
    ##############################
    # This file is automatically generated by the Strimzi Cluster Operator
    # Any changes to this file will be ignored and overwritten!
    ##############################
    ##############################

    FROM quay.io/strimzi/kafka:0.29.0-kafka-3.1.0

    USER root:root

    ##########
    # Connector plugin debezium-oracle-connector
    ##########
    RUN 'mkdir' '-p' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2' \
          && 'curl' '-f' '-L' '--output' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' 'https://repo1.maven.org/maven2/io/debezium/debezium-connector-oracle/1.9.5.Final/debezium-connector-oracle-1.9.5.Final-plugin.tar.gz' \
          && 'tar' 'xvfz' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' '-C' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2' \
          && 'rm' '-vf' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz'

    USER 1001
andreas-ibm commented 2 years ago

Got reproduction in plain Kubernetes (i.e. without Strimzi install I think):

apiVersion: v1
data:
  Dockerfile: |+
    ##############################
    ##############################
    # This file is automatically generated by the Strimzi Cluster Operator
    # Any changes to this file will be ignored and overwritten!
    ##############################
    ##############################

    FROM quay.io/strimzi/kafka:0.29.0-kafka-3.1.0

    USER root:root

    ##########
    # Connector plugin debezium-oracle-connector
    ##########
    RUN 'mkdir' '-p' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2' \
          && 'curl' '-f' '-L' '--output' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' 'https://repo1.maven.org/maven2/io/debezium/debezium-connector-oracle/1.9.5.Final/debezium-connector-oracle-1.9.5.Final-plugin.tar.gz' \
          && 'tar' 'xvfz' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz' '-C' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2' \
          && 'rm' '-vf' '/opt/kafka/plugins/debezium-oracle-connector/3bf764c2.tgz'

    USER 1001

kind: ConfigMap
metadata:
  name: kaniko-oracle-cluster-connect-dockerfile
  namespace: kafka

---
apiVersion: v1
kind: Pod
metadata:
  name: kaniko-oracle-cluster-connect-build
  namespace: kafka
spec:
  containers:
  - args:
    - --dockerfile=/dockerfile/Dockerfile
    - --image-name-with-digest-file=/dev/termination-log
    - --destination=uk.icr.io/debezium/debezium-connect-oracle:latest
    image: quay.io/strimzi/kaniko-executor:0.29.0
    imagePullPolicy: IfNotPresent
    name: kaniko-oracle-cluster-connect-build
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dockerfile
      name: dockerfile
    - mountPath: /kaniko/.docker
      name: docker-credentials
  restartPolicy: Never
  securityContext: {}
  terminationGracePeriodSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      items:
      - key: Dockerfile
        path: Dockerfile
      name: kaniko-oracle-cluster-connect-dockerfile
    name: dockerfile
  - name: docker-credentials
    secret:
      defaultMode: 292
      items:
      - key: .dockerconfigjson
        path: config.json
      secretName: ibmcloud-credentials
thurcombe commented 2 years ago

If it helps, we experienced the same thing. We added the following config to the runner toml

[runners.kubernetes.build_container_security_context.capabilities] add = ["CHOWN", "DAC_OVERRIDE","FOWNER","SETFCAP","SETGID","SETUID"]

This has resolved the issue although we still have to run kaniko as root :(

andreas-ibm commented 2 years ago

oh, oh, OH YEAH!

    securityContext:
      capabilities:
        add: ["CHOWN", "DAC_OVERRIDE","FOWNER","SETFCAP","SETGID","SETUID"]

@thurcombe thanks!

I guess I should try those one at a time to get a minimal set

thurcombe commented 2 years ago

@andreas-ibm glad it helped. We started on this journey after having a hard time with a customer who migrated to OKD4 which by default does not allow containers to run as root, step by step we got there with an additional SCC to permit UID0 and then quickly came across the caps issue.

This issue also talks about the required caps https://github.com/GoogleContainerTools/kaniko/issues/778 but if you want to slim that you might get away with FOWNER and DAC_OVERRIDE only.

For anyone that does need to do this in an OCP4/OKD4 cluster then it's a combination of an additional scc to permit UID0 and the required caps and then grant your service account access to the scc and update your toml/securityContext. In an ideal world we would not have to run this as root but that's another story for another day :)

DerrickKnighton commented 2 years ago

Been stuck on this issue for a while myself i was able to get around it by adding the flags --ignore-path=/usr/bin/newuidmap --ignore-path=/usr/bin/newgidmap to my /kaniko/executor command

chriskuipers commented 1 year ago

Came here exactly for this. We're trying to build containers via Gitlab Runners on Openshift4, without using Docker-in-docker and root. Eventually we managed using the instructions on https://docs.gitlab.com/ee/ci/docker/using_kaniko.html#building-a-docker-image-with-kaniko and @DerrickKnighton solution.

Eventually we settled on this setup, which is pretty reliable for us. Oddly enough, this did not make any change so we ended up not using it:

        [runners.kubernetes.build_container_security_context.capabilities]
          add = ["CHOWN", "SETUID", "SETGID", "FOWNER", "DAC_OVERRIDE"]

Current setup looks like: .gitlab-ci.yaml:

stages:
  - build

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.9.0-debug
    entrypoint: [""]
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
      --ignore-path=/usr/bin/newuidmap
      --ignore-path=/usr/bin/newgidmap
  tags:
    - openshift

Openshift configmap for gitlab runner:

  config.toml: |-
    [[runners]]
      executor = "kubernetes"
      [runners.kubernetes]
        [[runners.kubernetes.volumes.empty_dir]]
          name = "empty-dir"
          mount_path = "/"
          medium = "Memory"

Sadly enough, as @thurcombe mentiones, we did had to add the anyuid SSC to the gitlab runner service account. The container does not seem to run under root tho, but without that SSC, we still got the error building image: error building stage: failed to get filesystem from image: chown /: operation not permitted error.

kruserr commented 1 year ago

We have temporarily fixed this by locking our Kaniko version to gcr.io/kaniko-project/executor:v1.7.0-debug for root builds in GitLab runner.

I opened an issue to document this: https://github.com/GoogleContainerTools/kaniko/issues/2345

CaptainGlac1er commented 7 months ago

This is happening with v1.20.0-debug as well. However v1.7.0-debug does work.

martincomputershare commented 5 months ago

Observed with v1.22.0-debug. Had to add more file to --ignore-path than suggested above to solve.