aquasecurity / trivy-operator

Kubernetes-native security toolkit
https://aquasecurity.github.io/trivy-operator/latest
Apache License 2.0
1.18k stars 199 forks source link

ErrImageNeverPull with trivy.command = filesystem or rootfs #1978

Open chary1112004 opened 5 months ago

chary1112004 commented 5 months ago

What steps did you take and what happened:

Hi,

We saw there is issue when we configure trivy.command = filesystem or trivy.command = rootfs then sometimes scan job appear status ErrImageNeverPull.

Here is log of scan job

kubectl logs scan-vulnerabilityreport-755cd9546-k7wz6 -n trivy-system
Defaulted container "k8s-cluster" out of: k8s-cluster, 9797c3dc-a05b-4d8c-9e03-537c5348af40 (init), 4c278c3b-6eb8-449d-be86-2111c6f58d38 (init)
Error from server (BadRequest): container "k8s-cluster" in pod "scan-vulnerabilityreport-755cd9546-k7wz6" is waiting to start: ErrImageNeverPull

And this is message when we describe scan pod

...
  containerStatuses:
  - image: k8s.io/kubernetes:1.25.16-eks-508b6b3
    imageID: ""
    lastState: {}
    name: k8s-cluster
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: Container image "k8s.io/kubernetes:1.25.16-eks-508b6b3" is not present
          with pull policy of Never
        reason: ErrImageNeverPull
...

Any suggestion to resolve this issue would be very much appreciated!

Thanks!

Environment:

chen-keinan commented 5 months ago

@chary1112004 thanks for reporting this issue, I have never experienced it and I'll have to investigate it and update you

chen-keinan commented 4 months ago

@chary1112004 tried to investigate this, however no luck , I'm unable to reproduce it.

chary1112004 commented 4 months ago

@chen-keinan you have tried to reproduce when deploy trivy to eks?

chen-keinan commented 4 months ago

@chary1112004 nope, but I do not think its related to cloud provider setting, its look like cluster config in a way.

chary1112004 commented 4 months ago

@chen-keinan sorry, I just mean kubernetes

rknightion commented 3 months ago

I also get this on EKS when using Bottlerocket nodes (no idea if normal AL23 nodes also have it).

rickymulder commented 3 months ago

Also happens in a disconnected openshift environment. What I specifically see is the hash matches the tag of the kubelet version...so it's trying to pull a matching image, I just don't understand where it's getting the idea to pull from k8s.io - that's nowhere in my config.

I also have .Values.operator.infraAssessmentScannerEnabled: false, so I don't suspect its the nodeCollector. Any other ideas?

titansmc commented 3 months ago

I am also seeing this, let me know if I can provide any configuration details:

helm upgrade --install trivy-operator aqua/trivy-operator \
  --namespace trivy-system \
  --create-namespace \
  -f values.yaml \
  --version 0.21.4
nodeCollector:
  useNodeSelector: false
#  excludeNodes: node-role.kubernetes.io/control-plane=true
trivy:
  ignoreUnfixed: true
  command: filesystem
operator:
  controllerCacheSyncTimeout: 25m
trivyOperator:
  scanJobPodTemplateContainerSecurityContext:
    runAsUser: 0
ondrejmo commented 3 months ago

I have the same issue. The cluster is running v1.29.5+k3s1 on Ubuntu 22.04 and Trivy-operator is deployed using:

---

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: trivy-system

resources:
  - trivy-operator.yml
  - https://raw.githubusercontent.com/aquasecurity/trivy-operator/v0.21.1/deploy/static/trivy-operator.yaml

patches:
  - patch: |-
      - op: replace
        path: /data/OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_IMAGE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT
        value: "3"
    target:
      kind: ConfigMap
      name: trivy-operator-config
  - patch: |-
      - op: replace
        path: /data/trivy.command
        value: "rootfs"
    target:
      kind: ConfigMap
      name: trivy-operator-trivy-config
  - patch: |-
      - op: replace
        path: /data/scanJob.podTemplateContainerSecurityContext
        value: "{\"allowPrivilegeEscalation\":false,\"capabilities\":{\"drop\":[\"ALL\"]},\"privileged\":false,\"readOnlyRootFilesystem\":true,\"runAsUser\":0}"
    target:
      kind: ConfigMap
      name: trivy-operator
ondrejmo commented 3 months ago

I have the same issue. The cluster is running v1.29.5+k3s1 on Ubuntu 22.04 and Trivy-operator is deployed using:

---

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: trivy-system

resources:
  - trivy-operator.yml
  - https://raw.githubusercontent.com/aquasecurity/trivy-operator/v0.21.1/deploy/static/trivy-operator.yaml

patches:
  - patch: |-
      - op: replace
        path: /data/OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_IMAGE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT
        value: "3"
    target:
      kind: ConfigMap
      name: trivy-operator-config
  - patch: |-
      - op: replace
        path: /data/trivy.command
        value: "rootfs"
    target:
      kind: ConfigMap
      name: trivy-operator-trivy-config
  - patch: |-
      - op: replace
        path: /data/scanJob.podTemplateContainerSecurityContext
        value: "{\"allowPrivilegeEscalation\":false,\"capabilities\":{\"drop\":[\"ALL\"]},\"privileged\":false,\"readOnlyRootFilesystem\":true,\"runAsUser\":0}"
    target:
      kind: ConfigMap
      name: trivy-operator

I did a hard-restart of the cluster (rebooted all nodes, deleted & re-created all pods) and it seems to have fixed the issue for me.