aquasecurity / trivy-operator

Kubernetes-native security toolkit
https://aquasecurity.github.io/trivy-operator/latest
Apache License 2.0
1.25k stars 208 forks source link

node-collector eval_conflict_error #2052

Closed alekseytivonchik closed 5 months ago

alekseytivonchik commented 5 months ago

Install Trivy-Operator via Helm chart

$ helm install trivy --namespace trivy-systems -f values.yaml .
$ kgp -n trivy-systems
NAME                                   READY   STATUS      RESTARTS   AGE
node-collector-8488c5f87f-f47mf        0/1     Completed   0          18h
trivy-trivy-operator-5dcc769f9-2j9cd   1/1     Running     0          18h

In Trivy-Operator logs see error (repeat many times): 2024-05-02T13:58:14Z ERROR Reconciler error {"controller": "job", "controllerGroup": "batch", "controllerKind": "Job", "Job": {"name":"node-collector-8488c5f87f","namespace":"trivy-systems"}, "namespace": "trivy-systems", "name": "node-collector-8488c5f87f", "reconcileID": "b1f619e8-762d-4f0c-af7f-f5e97f888aa9", "error": "failed to evaluate policies on Node : externalPolicies/file_88.rego:35: eval_conflict_error: functions must not produce multiple outputs for same inputs"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:22

What does this error mean and how to fix it? I don't know if this is related to the specified error, but I have 20 pods in the target namespace, while I received only 10 vulnerabilityreports:

$ kgp -n mp-dev | grep -v NAME | wc -l
      20
$ k get vulnerabilityreports -n mp-dev
NAME                                                             REPOSITORY                TAG                             SCANNER   AGE
replicaset-cart-service-c86996cb4-cart-service-app               mp-dev/cart-service       dev-02-05-2024-23-11-c9625aef   Trivy     11h
replicaset-catalog-service-bbd4b7c87-catalog-service-app         mp-dev/catalog-service    dev-02-05-2024-23-05-c9625aef   Trivy     12h
replicaset-config-server-5f69dcfc4-config-server-app             mp-dev/config-server      dev-02-05-2024-23-03-c9625aef   Trivy     12h
replicaset-delivery-service-7987d5d5d7-delivery-service-app      mp-dev/delivery-service   dev-02-05-2024-23-06-c9625aef   Trivy     11h
replicaset-favorite-service-7bf74f7cc-favorite-service-app       mp-dev/favorite-service   dev-02-05-2024-23-14-c9625aef   Trivy     12h
replicaset-geo-service-5f54698bb5-geo-service-app                mp-dev/geo-service        dev-02-05-2024-23-02-c9625aef   Trivy     12h
replicaset-loyalty-service-6f67c8f9f5-loyalty-service-app        mp-dev/loyalty-service    dev-02-05-2024-23-18-c9625aef   Trivy     11h
replicaset-nginx-pdf-docs-dev-97667b688-nginx-pdf-docs-dev-app   customs/nginx-pdf-docs    master-dev-6cfffc8e             Trivy     18h
replicaset-payment-service-55dcdbb6ff-payment-service-app        mp-dev/payment-service    dev-02-05-2024-23-09-c9625aef   Trivy     11h
replicaset-web-service-7b8c675fb5-web-service-app                mp-dev/web-service        dev-30-01-2024-11-58-34605895   Trivy     18h

Why so? It's been over 10 hours. No any jobs with report scaners, only node-collector job

$ k get jobs -n trivy-systems
NAME                        COMPLETIONS   DURATION   AGE
node-collector-8488c5f87f   1/1           5s         19h

My configs:

$ k get cm -n trivy-systems trivy-operator -o jsonpath='{.data}' | jq
{
  "compliance.failEntriesLimit": "10",
  "configAuditReports.scanner": "Trivy",
  "node.collector.imageRef": "ghcr.io/aquasecurity/node-collector:0.1.4",
  "node.collector.nodeSelector": "true",
  "nodeCollector.tolerations": "[{\"effect\":\"NoSchedule\",\"key\":\"type\",\"operator\":\"Equal\",\"value\":\"monitoring\"}]",
  "nodeCollector.volumeMounts": "[{\"mountPath\":\"/var/lib/etcd\",\"name\":\"var-lib-etcd\",\"readOnly\":true},{\"mountPath\":\"/var/lib/kubelet\",\"name\":\"var-lib-kubelet\",\"readOnly\":true},{\"mountPath\":\"/var/lib/kube-scheduler\",\"name\":\"var-lib-kube-scheduler\",\"readOnly\":true},{\"mountPath\":\"/var/lib/kube-controller-manager\",\"name\":\"var-lib-kube-controller-manager\",\"readOnly\":true},{\"mountPath\":\"/etc/systemd\",\"name\":\"etc-systemd\",\"readOnly\":true},{\"mountPath\":\"/lib/systemd/\",\"name\":\"lib-systemd\",\"readOnly\":true},{\"mountPath\":\"/etc/kubernetes\",\"name\":\"etc-kubernetes\",\"readOnly\":true},{\"mountPath\":\"/etc/cni/net.d/\",\"name\":\"etc-cni-netd\",\"readOnly\":true}]",
  "nodeCollector.volumes": "[{\"hostPath\":{\"path\":\"/var/lib/etcd\"},\"name\":\"var-lib-etcd\"},{\"hostPath\":{\"path\":\"/var/lib/kubelet\"},\"name\":\"var-lib-kubelet\"},{\"hostPath\":{\"path\":\"/var/lib/kube-scheduler\"},\"name\":\"var-lib-kube-scheduler\"},{\"hostPath\":{\"path\":\"/var/lib/kube-controller-manager\"},\"name\":\"var-lib-kube-controller-manager\"},{\"hostPath\":{\"path\":\"/etc/systemd\"},\"name\":\"etc-systemd\"},{\"hostPath\":{\"path\":\"/lib/systemd\"},\"name\":\"lib-systemd\"},{\"hostPath\":{\"path\":\"/etc/kubernetes\"},\"name\":\"etc-kubernetes\"},{\"hostPath\":{\"path\":\"/etc/cni/net.d/\"},\"name\":\"etc-cni-netd\"}]",
  "policies.bundle.oci.ref": "ghcr.io/aquasecurity/trivy-checks:0",
  "report.recordFailedChecksOnly": "true",
  "scanJob.compressLogs": "true",
  "scanJob.podTemplateContainerSecurityContext": "{\"allowPrivilegeEscalation\":false,\"capabilities\":{\"drop\":[\"ALL\"]},\"privileged\":false,\"readOnlyRootFilesystem\":true}",
  "scanJob.tolerations": "[{\"effect\":\"NoSchedule\",\"key\":\"type\",\"operator\":\"Equal\",\"value\":\"monitoring\"}]",
  "vulnerabilityReports.scanner": "Trivy"
}
$ k get cm trivy-operator-config -n trivy-systems -o jsonpath='{.data}' | jq
{
  "CONTROLLER_CACHE_SYNC_TIMEOUT": "5m",
  "OPERATOR_ACCESS_GLOBAL_SECRETS_SERVICE_ACCOUNTS": "true",
  "OPERATOR_BATCH_DELETE_DELAY": "10s",
  "OPERATOR_BATCH_DELETE_LIMIT": "10",
  "OPERATOR_BUILT_IN_TRIVY_SERVER": "false",
  "OPERATOR_CACHE_REPORT_TTL": "120h",
  "OPERATOR_CLUSTER_COMPLIANCE_ENABLED": "true",
  "OPERATOR_CLUSTER_SBOM_CACHE_ENABLED": "false",
  "OPERATOR_CONCURRENT_NODE_COLLECTOR_LIMIT": "1",
  "OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT": "2",
  "OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED": "true",
  "OPERATOR_CONFIG_AUDIT_SCANNER_SCAN_ONLY_CURRENT_REVISIONS": "true",
  "OPERATOR_EXPOSED_SECRET_SCANNER_ENABLED": "true",
  "OPERATOR_HEALTH_PROBE_BIND_ADDRESS": ":9090",
  "OPERATOR_INFRA_ASSESSMENT_SCANNER_ENABLED": "true",
  "OPERATOR_LOG_DEV_MODE": "true",
  "OPERATOR_MERGE_RBAC_FINDING_WITH_CONFIG_AUDIT": "false",
  "OPERATOR_METRICS_BIND_ADDRESS": ":8080",
  "OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED": "false",
  "OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED": "false",
  "OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED": "false",
  "OPERATOR_METRICS_FINDINGS_ENABLED": "true",
  "OPERATOR_METRICS_IMAGE_INFO_ENABLED": "false",
  "OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED": "false",
  "OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED": "false",
  "OPERATOR_METRICS_VULN_ID_ENABLED": "false",
  "OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES": "{}",
  "OPERATOR_RBAC_ASSESSMENT_SCANNER_ENABLED": "true",
  "OPERATOR_SBOM_GENERATION_ENABLED": "true",
  "OPERATOR_SCANNER_REPORT_TTL": "24h",
  "OPERATOR_SCAN_JOB_RETRY_AFTER": "30s",
  "OPERATOR_SCAN_JOB_TIMEOUT": "5m",
  "OPERATOR_SCAN_JOB_TTL": "",
  "OPERATOR_SEND_DELETED_REPORTS": "false",
  "OPERATOR_VULNERABILITY_SCANNER_ENABLED": "true",
  "OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS": "true",
  "OPERATOR_WEBHOOK_BROADCAST_CUSTOM_HEADERS": "",
  "OPERATOR_WEBHOOK_BROADCAST_TIMEOUT": "30s",
  "OPERATOR_WEBHOOK_BROADCAST_URL": "https://hooks.slack.com/services/xxxxxxx/xxxxxxx/xxxxxxxx",
  "TRIVY_SERVER_HEALTH_CHECK_CACHE_EXPIRATION": "10h"
}
$ k get cm -n trivy-systems trivy-operator-trivy-config -o jsonpath='{.data}' | jq
{
  "trivy.additionalVulnerabilityReportFields": "",
  "trivy.command": "image",
  "trivy.dbRepository": "ghcr.io/aquasecurity/trivy-db",
  "trivy.dbRepositoryInsecure": "false",
  "trivy.filesystemScanCacheDir": "/var/trivyoperator/trivy-db",
  "trivy.imagePullPolicy": "IfNotPresent",
  "trivy.imageScanCacheDir": "/tmp/trivy/.cache",
  "trivy.includeDevDeps": "false",
  "trivy.javaDbRepository": "ghcr.io/aquasecurity/trivy-java-db",
  "trivy.mode": "Standalone",
  "trivy.repository": "ghcr.io/aquasecurity/trivy",
  "trivy.resources.limits.cpu": "650m",
  "trivy.resources.limits.memory": "900M",
  "trivy.resources.requests.cpu": "300m",
  "trivy.resources.requests.memory": "400M",
  "trivy.sbomSources": "",
  "trivy.severity": "UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL",
  "trivy.skipJavaDBUpdate": "false",
  "trivy.slow": "true",
  "trivy.supportedConfigAuditKinds": "Workload,Service,Role,ClusterRole,NetworkPolicy,Ingress,LimitRange,ResourceQuota",
  "trivy.tag": "0.50.4",
  "trivy.timeout": "5m0s",
  "trivy.useBuiltinRegoPolicies": "true"
}

Environment:

alekseytivonchik commented 5 months ago

UPD: after several days of Trivy operation, the reports were rotated and after that their number began to correspond to the number of workloads in target namespace

chen-keinan commented 5 months ago

@alekseytivonchik have you added your own policies ?

alekseytivonchik commented 5 months ago

@chen-keinan hello! Thanks for the reply. No, i didn't add a custom policies This is my values.yaml

chen-keinan commented 5 months ago

@alekseytivonchik I have made few changes in regards with latest trivy-operator v0.21.1 do you mind upgrading to latest version and let me know if issue reproduce ?

alekseytivonchik commented 5 months ago

@chen-keinan It is a great news. Of course, i will try to upgrade helm chart to the latest version. I will tell you about the result

alekseytivonchik commented 5 months ago

@chen-keinan I upgraded trivy-operator helm chart to version:

version: 0.23.1
appVersion: 0.21.1

The problem is resolved. Node-collector's pod successfully starts. Thanks for the help!