Open jicunningham opened 10 months ago
@jicunningham can you please share trivy-operator logs ? in addition you can add TTL for jobs to clear it
@chen-keinan Here is what I am seeing from the trivy container that did the scanning:
2023-09-30T02:56:46.151Z INFO Need to update DB
2023-09-30T02:56:46.151Z INFO DB Repository: ghcr.io/aquasecurity/trivy-db
2023-09-30T02:56:46.151Z INFO Downloading DB...
20.28 MiB / 40.07 MiB [------------------------------>______________________________] 50.61% ? p/s ?37.78 MiB / 40.07 MiB [--------------------------------------------------------->___] 94.29% ? p/s ?40.07 MiB / 40.07 MiB [----------------------------------------------------------->] 100.00% ? p/s ?40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 32.95 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 32.95 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 32.95 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 30.83 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 30.83 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 30.83 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 28.84 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 28.84 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 28.84 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 26.98 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 26.98 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 26.98 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [-------------------------------------------------] 100.00% 14.25 MiB p/s 3.0s
As for the logs of the other containers (the ones scanned) all there exists is what looks like a large public/private/cert text block.
Currently, the TTL is set to 24 hours. Here is the status of some of them:
@chen-keinan Here is what I am seeing from the trivy container that did the scanning:
2023-09-30T02:56:46.151Z INFO Need to update DB 2023-09-30T02:56:46.151Z INFO DB Repository: ghcr.io/aquasecurity/trivy-db 2023-09-30T02:56:46.151Z INFO Downloading DB... 20.28 MiB / 40.07 MiB [------------------------------>______________________________] 50.61% ? p/s ?37.78 MiB / 40.07 MiB [--------------------------------------------------------->___] 94.29% ? p/s ?40.07 MiB / 40.07 MiB [----------------------------------------------------------->] 100.00% ? p/s ?40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 32.95 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 32.95 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 32.95 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 30.83 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 30.83 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 30.83 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 28.84 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 28.84 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 28.84 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 26.98 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 26.98 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [---------------------------------------------->] 100.00% 26.98 MiB p/s ETA 0s40.07 MiB / 40.07 MiB [-------------------------------------------------] 100.00% 14.25 MiB p/s 3.0s
As for the logs of the other containers (the ones scanned) all there exists is what looks like a large public/private/cert text block.
Currently, the TTL is set to 24 hours. Here is the status of some of them:
just to clarify the TTL flag I mention above is for jobs
not report
This issue is stale because it has been labeled with inactivity.
@chen-keinan can this be reopened? Was there a solution?
@chen-keinan can this be reopened? Was there a solution?
try setting scanJobTTL param
This issue is stale because it has been labeled with inactivity.
Exists something similar for node-collector job? The node collector job finish succesfully, but the pod and the job remains in the cluster in complete state.
I'm using the helm chart, version 0.23.3
@FranAguiar do you see any errors in trivy-operator log ?
Hello @chen-keinan, yes, there is an error in the operator, I did not check it because the job was complete and I though everything was ok. This is the error:
{"level":"error","ts":"2024-06-13T09:08:41Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-6f7db594d8","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-6f7db594d8","reconcileID":"1ae4b06b-6299-461c-adfc-1e4c44ccbb85","error":"unexpected end of JSON input","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
@FranAguiar can you please delete the node-collector job and restart trivy-operator
Done, there is another error, specific for the node collector:
{"level":"error","ts":"2024-06-13T09:26:41Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"node-collector-765bcb57b","namespace":"trivy-system"},"namespace":"trivy-system","name":"node-collector-765bcb57b","reconcileID":"0f9aec59-4e0b-45bf-bef3-ca37e608a961","error":"failed to evaluate policies on Node : failed to run policy checks on resources","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
failed to evaluate policies on Node : failed to run policy checks on resources
@FranAguiar can you please share your configmaps ?
Sure:
trivy-operator:
apiVersion: v1
data:
compliance.failEntriesLimit: "10"
configAuditReports.scanner: Trivy
node.collector.imageRef: ghcr.io/aquasecurity/node-collector:0.2.1
node.collector.nodeSelector: "true"
nodeCollector.volumeMounts: '[{"mountPath":"/var/lib/etcd","name":"var-lib-etcd","readOnly":true},{"mountPath":"/var/lib/kubelet","name":"var-lib-kubelet","readOnly":true},{"mountPath":"/var/lib/kube-scheduler","name":"var-lib-kube-scheduler","readOnly":true},{"mountPath":"/var/lib/kube-controller-manager","name":"var-lib-kube-controller-manager","readOnly":true},{"mountPath":"/etc/systemd","name":"etc-systemd","readOnly":true},{"mountPath":"/lib/systemd/","name":"lib-systemd","readOnly":true},{"mountPath":"/etc/kubernetes","name":"etc-kubernetes","readOnly":true},{"mountPath":"/etc/cni/net.d/","name":"etc-cni-netd","readOnly":true}]'
nodeCollector.volumes: '[{"hostPath":{"path":"/var/lib/etcd"},"name":"var-lib-etcd"},{"hostPath":{"path":"/var/lib/kubelet"},"name":"var-lib-kubelet"},{"hostPath":{"path":"/var/lib/kube-scheduler"},"name":"var-lib-kube-scheduler"},{"hostPath":{"path":"/var/lib/kube-controller-manager"},"name":"var-lib-kube-controller-manager"},{"hostPath":{"path":"/etc/systemd"},"name":"etc-systemd"},{"hostPath":{"path":"/lib/systemd"},"name":"lib-systemd"},{"hostPath":{"path":"/etc/kubernetes"},"name":"etc-kubernetes"},{"hostPath":{"path":"/etc/cni/net.d/"},"name":"etc-cni-netd"}]'
policies.bundle.insecure: "false"
policies.bundle.oci.ref: ghcr.io/aquasecurity/trivy-checks:0
report.recordFailedChecksOnly: "true"
scanJob.compressLogs: "true"
scanJob.podTemplateContainerSecurityContext: '{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":true}'
vulnerabilityReports.scanner: Trivy
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: trivy-operator
meta.helm.sh/release-namespace: trivy-system
creationTimestamp: "2024-06-11T12:57:02Z"
labels:
app.kubernetes.io/instance: trivy-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: trivy-operator
app.kubernetes.io/version: 0.21.3
helm.sh/chart: trivy-operator-0.23.3
name: trivy-operator
namespace: trivy-system
resourceVersion: "315642750"
uid: b4e35399-80ba-4c1a-ac57-bd33c2e526a7
apiVersion: v1
data:
compliance.failEntriesLimit: "10"
configAuditReports.scanner: Trivy
node.collector.imageRef: ghcr.io/aquasecurity/node-collector:0.2.1
node.collector.nodeSelector: "true"
nodeCollector.volumeMounts: '[{"mountPath":"/var/lib/etcd","name":"var-lib-etcd","readOnly":true},{"mountPath":"/var/lib/kubelet","name":"var-lib-kubelet","readOnly":true},{"mountPath":"/var/lib/kube-scheduler","name":"var-lib-kube-scheduler","readOnly":true},{"mountPath":"/var/lib/kube-controller-manager","name":"var-lib-kube-controller-manager","readOnly":true},{"mountPath":"/etc/systemd","name":"etc-systemd","readOnly":true},{"mountPath":"/lib/systemd/","name":"lib-systemd","readOnly":true},{"mountPath":"/etc/kubernetes","name":"etc-kubernetes","readOnly":true},{"mountPath":"/etc/cni/net.d/","name":"etc-cni-netd","readOnly":true}]'
nodeCollector.volumes: '[{"hostPath":{"path":"/var/lib/etcd"},"name":"var-lib-etcd"},{"hostPath":{"path":"/var/lib/kubelet"},"name":"var-lib-kubelet"},{"hostPath":{"path":"/var/lib/kube-scheduler"},"name":"var-lib-kube-scheduler"},{"hostPath":{"path":"/var/lib/kube-controller-manager"},"name":"var-lib-kube-controller-manager"},{"hostPath":{"path":"/etc/systemd"},"name":"etc-systemd"},{"hostPath":{"path":"/lib/systemd"},"name":"lib-systemd"},{"hostPath":{"path":"/etc/kubernetes"},"name":"etc-kubernetes"},{"hostPath":{"path":"/etc/cni/net.d/"},"name":"etc-cni-netd"}]'
policies.bundle.insecure: "false"
policies.bundle.oci.ref: ghcr.io/aquasecurity/trivy-checks:0
report.recordFailedChecksOnly: "true"
scanJob.compressLogs: "true"
scanJob.podTemplateContainerSecurityContext: '{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":true}'
vulnerabilityReports.scanner: Trivy
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: trivy-operator
meta.helm.sh/release-namespace: trivy-system
creationTimestamp: "2024-06-11T12:57:02Z"
labels:
app.kubernetes.io/instance: trivy-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: trivy-operator
app.kubernetes.io/version: 0.21.3
helm.sh/chart: trivy-operator-0.23.3
name: trivy-operator
namespace: trivy-system
resourceVersion: "315642750"
uid: b4e35399-80ba-4c1a-ac57-bd33c2e526a7
trivy-operator-policies-config
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: trivy-operator
meta.helm.sh/release-namespace: trivy-system
creationTimestamp: "2024-06-11T12:57:02Z"
labels:
app.kubernetes.io/instance: trivy-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: trivy-operator
app.kubernetes.io/version: 0.21.3
helm.sh/chart: trivy-operator-0.23.3
name: trivy-operator-policies-config
namespace: trivy-system
resourceVersion: "315642749"
uid: 9823e829-6931-4b1e-9dfb-16379711bac2
trivy-operator-trivy-config
apiVersion: v1
data:
trivy.additionalVulnerabilityReportFields: ""
trivy.command: image
trivy.dbRepository: ghcr.io/aquasecurity/trivy-db
trivy.dbRepositoryInsecure: "false"
trivy.filesystemScanCacheDir: /var/trivyoperator/trivy-db
trivy.ignoreUnfixed: "true"
trivy.imagePullPolicy: IfNotPresent
trivy.imageScanCacheDir: /tmp/trivy/.cache
trivy.includeDevDeps: "false"
trivy.javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
trivy.mode: Standalone
trivy.repository: ghcr.io/aquasecurity/trivy
trivy.resources.limits.cpu: 500m
trivy.resources.limits.memory: 500M
trivy.resources.requests.cpu: 100m
trivy.resources.requests.memory: 100M
trivy.sbomSources: ""
trivy.severity: UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL
trivy.skipJavaDBUpdate: "false"
trivy.slow: "true"
trivy.supportedConfigAuditKinds: Workload,Service,Role,ClusterRole,NetworkPolicy,Ingress,LimitRange,ResourceQuota
trivy.tag: 0.52.0
trivy.timeout: 5m0s
trivy.useBuiltinRegoPolicies: "true"
trivy.useEmbeddedRegoPolicies: "false"
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: trivy-operator
meta.helm.sh/release-namespace: trivy-system
creationTimestamp: "2024-06-11T12:57:02Z"
labels:
app.kubernetes.io/instance: trivy-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: trivy-operator
app.kubernetes.io/version: 0.21.3
helm.sh/chart: trivy-operator-0.23.3
name: trivy-operator-trivy-config
namespace: trivy-system
resourceVersion: "315642748"
uid: 314204f1-7780-4f33-8501-e35d1ecaf1b7
trivy-operator-config
apiVersion: v1
data:
CONTROLLER_CACHE_SYNC_TIMEOUT: 5m
OPERATOR_ACCESS_GLOBAL_SECRETS_SERVICE_ACCOUNTS: "true"
OPERATOR_BATCH_DELETE_DELAY: 10s
OPERATOR_BATCH_DELETE_LIMIT: "10"
OPERATOR_BUILT_IN_TRIVY_SERVER: "false"
OPERATOR_CACHE_REPORT_TTL: 120h
OPERATOR_CLUSTER_COMPLIANCE_ENABLED: "true"
OPERATOR_CLUSTER_SBOM_CACHE_ENABLED: "false"
OPERATOR_CONCURRENT_NODE_COLLECTOR_LIMIT: "1"
OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT: "10"
OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED: "true"
OPERATOR_CONFIG_AUDIT_SCANNER_SCAN_ONLY_CURRENT_REVISIONS: "true"
OPERATOR_EXPOSED_SECRET_SCANNER_ENABLED: "true"
OPERATOR_HEALTH_PROBE_BIND_ADDRESS: :9090
OPERATOR_INFRA_ASSESSMENT_SCANNER_ENABLED: "true"
OPERATOR_LOG_DEV_MODE: "false"
OPERATOR_MERGE_RBAC_FINDING_WITH_CONFIG_AUDIT: "false"
OPERATOR_METRICS_BIND_ADDRESS: :8080
OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED: "false"
OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED: "true"
OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED: "true"
OPERATOR_METRICS_FINDINGS_ENABLED: "true"
OPERATOR_METRICS_IMAGE_INFO_ENABLED: "false"
OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED: "false"
OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED: "true"
OPERATOR_METRICS_VULN_ID_ENABLED: "true"
OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES: '{}'
OPERATOR_RBAC_ASSESSMENT_SCANNER_ENABLED: "true"
OPERATOR_SBOM_GENERATION_ENABLED: "true"
OPERATOR_SCAN_JOB_RETRY_AFTER: 30s
OPERATOR_SCAN_JOB_TIMEOUT: 5m
OPERATOR_SCAN_JOB_TTL: 1m
OPERATOR_SCANNER_REPORT_TTL: 24h
OPERATOR_SEND_DELETED_REPORTS: "false"
OPERATOR_VULNERABILITY_SCANNER_ENABLED: "true"
OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS: "true"
OPERATOR_WEBHOOK_BROADCAST_CUSTOM_HEADERS: ""
OPERATOR_WEBHOOK_BROADCAST_TIMEOUT: 30s
OPERATOR_WEBHOOK_BROADCAST_URL: ""
TRIVY_SERVER_HEALTH_CHECK_CACHE_EXPIRATION: 10h
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: trivy-operator
meta.helm.sh/release-namespace: trivy-system
creationTimestamp: "2024-06-11T12:57:02Z"
labels:
app.kubernetes.io/instance: trivy-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: trivy-operator
app.kubernetes.io/version: 0.21.3
helm.sh/chart: trivy-operator-0.23.3
name: trivy-operator-config
namespace: trivy-system
resourceVersion: "315751696"
uid: 68bda532-8194-44ac-b552-9d6bcf7da757
can you do a quick test and switch the following params value to:
trivy.useBuiltinRegoPolicies: "false"
trivy.useEmbeddedRegoPolicies: "true"
and let me know if you get an error ?
Same error
{"level":"error","ts":"2024-06-13T10:06:18Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"node-collector-6496488658","namespace":"trivy-system"},"namespace":"trivy-system","name":"node-collector-6496488658","reconcileID":"65cb6c3c-3df5-419f-a5c7-b9444bdf5b3c","error":"failed to evaluate policies on Node : failed to run policy checks on resources","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
strange, config looks ok and should work with both config ways
I use the chart, version 0.23.3. Installed with terraform. My settings below
namespace = "trivy-system"
chart_name = "trivy-operator"
chart_version = "0.23.3"
repository = "https://aquasecurity.github.io/helm-charts"
chart_values = [
{
name = "serviceMonitor.enabled",
value = true
},
{
name = "trivy.ignoreUnfixed",
value = true
},
{
name = "operator.scanJobTTL",
value = "1m"
},
{
name = "operator.metricsVulnIdEnabled"
value = true
},
{
name = "operator.metricsExposedSecretInfo"
value = true
},
{
name = "operator.metricsConfigAuditInfo"
value = true
},
{
name = "operator.metricsRbacAssessmentInfo"
value = true
},
]
Some how it is working, I have the reports in prometheus/grafana, the only issue is with the node-collector pod
sure, it is not depend, you are missing clusterInfraAssessment reports and compliance reports because of it
I tried both, but I will start with those for now. Maybe the pod job is an issue in this version, Do you think it worth try an older version?
I do not think so, as we have tests who check it and release has passed.
one think I could suggest is to completely delete trivy-operator include crds and re-install is again
helm uninstall trivy-operator -n trivy-system
kubectl delete crd vulnerabilityreports.aquasecurity.github.io
kubectl delete crd exposedsecretreports.aquasecurity.github.io
kubectl delete crd configauditreports.aquasecurity.github.io
kubectl delete crd clusterconfigauditreports.aquasecurity.github.io
kubectl delete crd rbacassessmentreports.aquasecurity.github.io
kubectl delete crd infraassessmentreports.aquasecurity.github.io
kubectl delete crd clusterrbacassessmentreports.aquasecurity.github.io
kubectl delete crd clustercompliancereports.aquasecurity.github.io
kubectl delete crd clusterinfraassessmentreports.aquasecurity.github.io
kubectl delete crd sbomreports.aquasecurity.github.io
kubectl delete crd clustersbomreports.aquasecurity.github.io
kubectl delete crd clustervulnerabilityreports.aquasecurity.github.io
Tried that, same behaviour.
I saw that the node collector can be disabled. What does it do? It's optional?
its just a way to assign it to specific node.
if you want to disable node-collector , you can configure this
I configured that, but node collector still running and keep forever after complete
@FranAguiar you'll have to delete the all node-collector job set the flag and restart trivy-operator
Already tried that, the job reappears, run and complete. This what I see in the trivy-operator logs
{"level":"error","ts":"2024-06-19T11:33:22Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-7c49c8f64f","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-7c49c8f64f","reconcileID":"e2ee6e6e-4640-4cff-a810-2dc4ec56911a","error":"unexpected end of JSON input","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
Helm chart 0.23.3 and GKE Version: v1.30.1-gke.1261000
@FranAguiar stange it should not reconcile nodes if InfraAssessmentScannerEnabled is disabled
InfraAssessmentScannerEnabled is enabled by default, I disabled it and now the pod and the job do not stay.
InfraAssessmentScannerEnabled is enabled by default, I disabled it and now the pod and the job do not stay.
thanks for the update, this is workaround, still need to investigate the root cause in your env.
I enabled InfraAssessmentScanner again because all metrics where gone. With the same settings as above I have the below errors in the operator pod
{"level":"error","ts":"2024-06-24T08:26:36Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"node-collector-98d7c6d45","namespace":"trivy-system"},"namespace":"trivy-system","name":"node-collector-98d7c6d45","reconcileID":"78ba1f93-229f-4846-8eb0-f978a7b13d14","error":"failed to evaluate policies on Node : failed to run policy checks on resources","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T07:42:44Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-cf59d7fff","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-cf59d7fff","reconcileID":"143546e1-c0cc-433b-b949-12a490dc09c5","error":"unexpected end of JSON input","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
Anything else I can provide?
which trivy-operator version are you using?
Latest helm chart
NAME CHART VERSION APP VERSION
aquasecurity/trivy-operator 0.23.3 0.21.3
What steps did you take and what happened:
Often our vulnerability scanning pods will complete successfully and just stay in the cluster, not removing themselves like they are supposed to. When we have these scanners laying around it sometimes prevents new scans from happening because we have it set to run 3 at a time so as not to put too much pressure on the cluster itself. With the pods left over it will prevent further scanning because the config sees that they exist already.
What did you expect to happen: We expected the pods to run and then terminate Anything else you would like to add:
Here are some logs:
Environment:
trivy-operator version
): 0.15.1kubectl version
):