Open lkaluza-fadi opened 5 months ago
@lkaluza-fadi Please clean up all scan-jobs and restart operator.
kubectl delete jobs `kubectl get jobs -n trivy-system -o custom-columns=:.metadata.name`
@chen-keinan After deleting the jobs, everything seems to be fine, but when the jobs were completed, the reconciliation errors returned.
@lkaluza-fadi the is the pod stuck in status completed
?
@lkaluza-fadi the is the pod stuck in status
completed
?
yes, thats correct.
@lkaluza-fadi can you please get it output and sent it (you can send it to me in slack if you do not want to expose it here)
kubectl logs pod <scan-pod-name> -n trivy-system
@lkaluza-fadi can you please get it output and sent it (you can send it to me in slack if you do not want to expose it here)
kubectl logs pod <scan-pod-name> -n trivy-system
unfortunately there are no pod logs anymore.
@lkaluza-fadi are you able to reproduce it ?
@lkaluza-fadi are you able to reproduce it ?
tried to reproduce it, but the logs are gone again
is the pod is stuck in completed status ? if so , logs should be there
@lkaluza-fadi do you get any reports ?
@chen-keinan yes, just send it over to you via email. the email that you have in your profil mentioned.
@lkaluza-fadi can you please do another check:
uninstall trivy-operator : helm uninstall trivy-operator -n trivy-system
delete all CRDs:
kubectl delete crd vulnerabilityreports.aquasecurity.github.io
kubectl delete crd exposedsecretreports.aquasecurity.github.io
kubectl delete crd configauditreports.aquasecurity.github.io
kubectl delete crd clusterconfigauditreports.aquasecurity.github.io
kubectl delete crd rbacassessmentreports.aquasecurity.github.io
kubectl delete crd infraassessmentreports.aquasecurity.github.io
kubectl delete crd clusterrbacassessmentreports.aquasecurity.github.io
kubectl delete crd clustercompliancereports.aquasecurity.github.io
kubectl delete crd clusterinfraassessmentreports.aquasecurity.github.io
kubectl delete crd sbomreports.aquasecurity.github.io
kubectl delete crd clustersbomreports.aquasecurity.github.io
kubectl delete crd clustervulnerabilityreports.aquasecurity.github.io
make sure no pods or jobs running in trivy-system
namespace
re-install trivy-operator again with helm and set this flag to false
@lkaluza-fadi can you please do another check:
- uninstall trivy-operator :
helm uninstall trivy-operator -n trivy-system
- delete all CRDs:
kubectl delete crd vulnerabilityreports.aquasecurity.github.io kubectl delete crd exposedsecretreports.aquasecurity.github.io kubectl delete crd configauditreports.aquasecurity.github.io kubectl delete crd clusterconfigauditreports.aquasecurity.github.io kubectl delete crd rbacassessmentreports.aquasecurity.github.io kubectl delete crd infraassessmentreports.aquasecurity.github.io kubectl delete crd clusterrbacassessmentreports.aquasecurity.github.io kubectl delete crd clustercompliancereports.aquasecurity.github.io kubectl delete crd clusterinfraassessmentreports.aquasecurity.github.io kubectl delete crd sbomreports.aquasecurity.github.io kubectl delete crd clustersbomreports.aquasecurity.github.io kubectl delete crd clustervulnerabilityreports.aquasecurity.github.io
- make sure no pods or jobs running in
trivy-system
namespace- re-install trivy-operator again with helm and set this flag to
false
done that!
and what changed so far is that the pods for the jobs are now gone after they are done. and for that reason the operator is not logging any reconcile errors any more.
@lkaluza-fadi not sure I understand the question. are you getting reports after the change above ?
@chen-keinan to wrap this up. the reconcile errors are back, but they are now a bit different
{"level":"error","ts":"2024-06-24T10:44:56Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-6f849756bb","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-6f849756bb","reconcileID":"71886afd-c52b-45e5-a36c-b7737c65d5cf","error":"invalid character 'u' looking for beginning of value; invalid character 'u' looking for beginning of value","errorCauses":[{"error":"invalid character 'u' looking for beginning of value"},{"error":"invalid character 'u' looking for beginning of value"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T10:47:00Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-65df45bb54","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-65df45bb54","reconcileID":"beaf874f-73ca-473d-875e-ea520c90018b","error":"invalid character 'u' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T10:47:01Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-8448d97cbb","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-8448d97cbb","reconcileID":"dfdd5a68-09c3-45c5-a880-f90bbb0f88cb","error":"invalid character 'u' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T10:59:04Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-599dbf4488","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-599dbf4488","reconcileID":"8ecb8af1-af3e-4fcb-b2ff-8294f41b7e63","error":"invalid character 'u' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.3/pkg/internal/controller/controller.go:222"}
so back to your question we are getting reports after then changes. but I think we are back to the beginning getting this reconcile error but now in a different flavor!
@lkaluza-fadi I'll be happy to jump-in a zoom call to look at the issue, its very difficult to find what is wrong in your env.
@chen-keinan iam fine with it when does it fit for you?
@lkaluza-fadi find me on slack we can discuss schedule details there
@lkaluza-fadi I mean find me via aqua security slack
@chen-keinan I'm not using slack how do i do so?
This seems related to #1792.
Hi, I'm facing the same problem:
is there any solution?
My cluster starting to have the same issue. Already reinstalled trviy-operator.
EDIT: Running on 1.31. Kubernetes SuccessPolicy changed. https://github.com/aquasecurity/trivy-operator/issues/2251
Same error:
{
"level": "error",
"ts": "2024-09-05T09:42:22Z",
"msg": "Reconciler error",
"controller": "job",
"controllerGroup": "batch",
"controllerKind": "Job",
"Job": {
"name": "scan-vulnerabilityreport-86c64f59b9",
"namespace": "trivy-operator"
},
"namespace": "trivy-operator",
"name": "scan-vulnerabilityreport-86c64f59b9",
"reconcileID": "18547f15-5d01-42ed-b1b4-f208335a0fae",
"error": "unrecognized scan job condition: SuccessCriteriaMet",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222"
}
I am using the lastest helm chart 0.24.1 and I don't see any vulnerability or sbom reports.
This issue is stale because it has been labeled with inactivity.
I am experiencing the same issue. Disabling the scanJobCompressLogs flag did not help. Version 0.22
Hey, I have the same problem. I am running the Trivy-Operator Helm Chart on Kubernetes Version v1.31.1
Chart Version: "2.5.0"
Helm Values:
trivy-operator:
log_level: INFO
serviceMonitor:
enabled: true
grafana:
namespace: prometheus
dashboards:
enabled: true
label: grafana_dashboard
value: "1"
folder:
annotation: k8s-sidecar-target-directory
name: /tmp/dashboards/site-reliability
persistence:
enabled: true
storageClass: csi-default
namespaceScanner:
clusterWide: true
integrations:
policyreport: false
clusterScanner:
enabled: true
crontab: "*/1 * * * *"
trivy:
ignoreUnfixed: true
operator:
metricsVulnIdEnabled: true
On of the error mesasges:
{"level":"error","ts":"2024-11-18T06:02:19Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-77967db879","namespace":"trivy-operator"},"namespace":"trivy-operator","name":"scan-vulnerabilityreport-77967db879","reconcileID":"910004aa-b41b-4196-8e8e-b58f0353be2d","error":"unrecognized scan job condition: SuccessCriteriaMet","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235"}
Hey, I have the same problem. I am running the Trivy-Operator Helm Chart on Kubernetes Version v1.31.1
On of the error mesasges:
{"level":"error","ts":"2024-11-18T06:02:19Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-77967db879","namespace":"trivy-operator"},"namespace":"trivy-operator","name":"scan-vulnerabilityreport-77967db879","reconcileID":"910004aa-b41b-4196-8e8e-b58f0353be2d","error":"unrecognized scan job condition: SuccessCriteriaMet","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235"}
Thanks for the report! Unfortunately, It's a known issue, you can track it #2251
What steps did you take and what happened:
Upgraded from helm version from 0.23.1 -> 0.23.3
What did you expect to happen:
That everything works smoothly
Anything else you would like to add:
This is the error that we get:
trivy-operator version
): 0.21.3kubectl version
): 1.28.9-gke.1000000