devopstales / trivy-operator

Kubernetes Operator based on the open-source container vulnerability scanner Trivy.
https://devopstales.github.io/trivy-operator/
Apache License 2.0
47 stars 10 forks source link

All my scan jobs terminate with "OOMKilled" #38

Closed gganssauge closed 1 year ago

gganssauge commented 1 year ago

Describe the bug After I set label trivy-scan: "true" on namespace default all scan jobs log the following (with varying containers):

{
  "level": "error",
  "ts": 1669701178.2908242,
  "logger": "reconciler.vulnerabilityreport",
  "msg": "Scan job container",
  "job": "trivy-operator/scan-vulnerabilityreport-54989486b6",
  "container": "aurora-haufe",
  "status.reason": "OOMKilled",
  "status.message": "Killed",
  "stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).processFailedScanJob
    /home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:573
github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1
    /home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:398
sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/reconcile/reconcile.go:102
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234"
}

Expected behavior The scan jobs should terminate normally

Configuration file I don't know where to find it

Helm Values file

ServiceMonitor:
  enabled: true
affinity: {}
automountServiceAccountToken: true
compliance:
  failEntriesLimit: 10
excludeNamespaces: ""
fullnameOverride: ""
image:
  pullPolicy: IfNotPresent
  pullSecrets: []
  repository: ghcr.io/aquasecurity/trivy-operator
  tag: ""
managedBy: Helm
nameOverride: ""
nodeSelector: {}
operator:
  accessGlobalSecretsAndServiceAccount: true
  batchDeleteDelay: 10s
  batchDeleteLimit: 10
  builtInTrivyServer: false
  configAuditScannerEnabled: true
  configAuditScannerScanOnlyCurrentRevisions: true
  exposedSecretScannerEnabled: true
  infraAssessmentScannerEnabled: true
  leaderElectionId: trivyoperator-lock
  logDevMode: false
  mergeRbacFindingWithConfigAudit: false
  metricsFindingsEnabled: true
  metricsVulnIdEnabled: false
  namespace: ""
  podLabels: {}
  privateRegistryScanSecretsNames: {}
  rbacAssessmentScannerEnabled: true
  replicas: 1
  scanJobTimeout: 5m
  scanJobsConcurrentLimit: 10
  scanJobsRetryDelay: 30s
  scannerReportTTL: 24h
  vulnerabilityScannerEnabled: true
  vulnerabilityScannerScanOnlyCurrentRevisions: true
  webhookBroadcastTimeout: 30s
  webhookBroadcastURL: ""
podAnnotations: {}
podSecurityContext: {}
priorityClassName: ""
rbac:
  create: true
resources: {}
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  privileged: false
  readOnlyRootFilesystem: true
service:
  metricsPort: 80
serviceAccount:
  annotations: {}
  create: true
  name: ""
serviceMonitor:
  enabled: false
  interval: ""
  labels: {}
targetNamespaces: ""
targetWorkloads: pod,replicaset,replicationcontroller,statefulset,daemonset,cronjob,job
tolerations: []
trivy:
  additionalVulnerabilityReportFields: ""
  command: image
  createConfig: true
  dbRepository: ghcr.io/aquasecurity/trivy-db
  dbRepositoryInsecure: "false"
  ignoreUnfixed: true
  insecureRegistries: {}
  mode: Standalone
  nonSslRegistries: {}
  offlineScan: false
  registry:
    mirror: {}
  repository: ghcr.io/aquasecurity/trivy
  resources:
    limits:
      cpu: 500m
      memory: 500M
    requests:
      cpu: 100m
      memory: 100M
  serverPassword: ""
  serverServiceName: trivy-service
  serverTokenHeader: Trivy-Token
  serverUser: ""
  severity: UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL
  supportedConfigAuditKinds: Workload,Service,Role,ClusterRole,NetworkPolicy,Ingress,LimitRange,ResourceQuota
  tag: 0.34.0
  timeout: 5m0s
  useBuiltinRegoPolicies: "true"
trivyOperator:
  configAuditReportsPlugin: Trivy
  metricsResourceLabelsPrefix: k8s_label_
  reportRecordFailedChecksOnly: false
  reportResourceLabels: ""
  scanJobAnnotations: ""
  scanJobAutomountServiceAccountToken: false
  scanJobCompressLogs: true
  scanJobNodeSelector: {}
  scanJobPodTemplateContainerSecurityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    privileged: false
    readOnlyRootFilesystem: true
  scanJobPodTemplateLabels: ""
  scanJobPodTemplatePodSecurityContext: {}
  scanJobTolerations: []
  skipResourceByLabels: ""
  vulnerabilityReportsPlugin: Trivy

Environment:

Additional context After I set ´trivy-scan: "true"on namespacedefault` all scan jobs terminated like this (according to the pod log).

{
  "level": "error",
  "ts": 1669701178.2908242,
  "logger": "reconciler.vulnerabilityreport",
  "msg": "Scan job container",
  "job": "trivy-operator/scan-vulnerabilityreport-54989486b6",
  "container": "aurora-haufe",
  "status.reason": "OOMKilled",
  "status.message": "Killed",
  "stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).processFailedScanJob
    /home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:573
github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1
    /home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:398
sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/reconcile/reconcile.go:102
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234"
}
devopstales commented 1 year ago

@gganssauge You deployed the wrong operator. This is devopstales/trivy-operator not aquasecurity/trivy-operator. And the OOMKilled means that your cluster has not enough resource to start the container.