Open albertschwarzkopf opened 2 years ago
Today I have again an "scan-vulnerabilityreport" pod and corresponding job which are in status "Completed".
But starboard operator has following error:
{"level":"error","ts":1647341024.8277369,"logger":"controller.job","msg":"Reconciler error","reconciler group":"batch","reconciler kind":"Job","name":"scan-vulnerabilityreport-7b89599899","namespace":"starboard-system","error":"unexpected EOF","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"}
kubectl -n starboard-system logs scan-vulnerabilityreport-7b89599899-tlmpt
The last part of the log: ......
{
"VulnerabilityID": "CVE-2013-4235",
"PkgName": "passwd",
"InstalledVersion": "1:4.5-1.1",
"Layer": {
"Digest": "sha256:15115158dd02a1bf2fd28724e3c1024394033fb0e9a5d3e451ed2715b6ae312d",
"DiffID": "sha256:e5baccb54724b971f73bbfa46d477b947c9066e4040d0e002e8f04314f58b58f"
},
"SeveritySource": "debian",
"PrimaryURL": "https://avd.aquasec.com/nvd/cve-2013-4235",
"DataSource": {
"ID": "debian",
"Name": "Debian Security Tracker",
"URL": "https://salsa.debian.org/security-tracker-team/security-tracker"
},
"Title": "shadow-utils: TOCTOU race conditions by copying and removing directory trees",
"Description": "shadow: TOCTOU (time-of-check time-of-use) race condition when copying and removing directory trees",
"Severity": "LOW",
"CweIDs": [
"CWE-367"
],
"CVSS": {
"nvd": {
"V2Vector": "AV:L/AC:M/Au:N/C:N/I:P/A:P",
"V3Vector": "CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:H/A:N",
"V2Score": 3.3,
"V3Score": 4.7
},
"redhat": {
"V2Vector": "AV:L/AC:H/Au:N/C:P/I:P/A:P",
"V3Vector": "CVSS:3.1/AV:L/AC:H/PR:L/UI:R/S:U/C:N/I:H/A:N",
"V2Score": 3.7,
"V3Score": 4.4
}
},
"References": [
"https://access.redhat.com/security/cve/CVE-2013-4235",
"https://access.redhat.com/security/cve/cve-2013-4235",
"https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2013-4235",
"https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4235",
"https://lists.apache.org/thread.html/rf9fa47ab66495c78bb4120b0754dd9531ca2ff0430f6685ac9b07772@%3Cdev.mina.apache.org%3E",
"https://security-tracker.debian.org/tracker/CVE-2013-4235"
],
"PublishedDate": "2019-12-03T15:15:00Z",
"LastModifiedDate": "2021-02-25T17:15:00Z"
},
{
"VulnerabilityID": "CVE-2018-7169",
"PkgName": "passwd",
"InstalledVersion": "1:4.5-1.1",
"Layer": {
"Digest": "sha256:15115158dd02a1bf2fd28724e3c1024394033fb0e9a5d3%
What about VulnerabilityReport? Is it created after all?
What about VulnerabilityReport? Is it created after all?
Yes the VR for the specific image exists. But new VR were not created.
I'm not sure I understood. What do you mean by "new VR"?
I'm not sure I understood. What do you mean by "new VR"?
We use the "OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL" (vulnerabilityScannerReportTTL) parameter. So that the VulnerabilityReports (VRs) are generated every 24h.
I'm also getting the "error":"unexpected EOF" message on some scans. Looking at the log of the job, I see the JSON response ends at:
{
...,
"Results": [
....
]
The interesting thing is one of the images this happens on has already been scanned from another deployment and had no issues. The vulnerability report is NOT created when this happens.
Additional Note: I copied the scan job yaml for the failing scan, I deployed it as a separate job to see what would happen and the JSON results came back just fine.
Today it happend again:
╰─ kubectl -n starboard-system get pods
NAME READY STATUS RESTARTS AGE
scan-vulnerabilityreport-77444bf746-lzlq7 0/1 Completed 0 24h
starboard-exporter-6fc5c8f9c6-6bhx5 1/1 Running 0 53d
starboard-operator-866776846f-tdcg8 1/1 Running 0 19d
trivy-server-0 1/1 Running 0 30d
╰─ kubectl -n starboard-system get job
NAME COMPLETIONS DURATION AGE
scan-vulnerabilityreport-77444bf746 1/1 3s 24h
╰─ kubectl -n starboard-system logs scan-vulnerabilityreport-77444bf746-lzlq7
{
"VulnerabilityID": "CVE-2020-16156",
"PkgName": "perl-base",
"InstalledVersion": "5.28.1-6+deb10u1",
"Layer": {
"Digest": "sha256:6552179c3509e3c4314b4065e0d2790563d01cd474e2fdd58be4d46acd48af6a",
"DiffID": "sha256:f18b02b14138b6f9808f9843cc645e2edd64b02ca1c87e671355f56d1b4b5ec6"
},
"SeveritySource": "nvd",
"PrimaryURL": "https://avd.aquasec.com/nvd/cve-2020-16156",
"DataSource": {
"ID": "debian",
"Name": "Debian Security Tracker",
"URL": "https://salsa.debian.org/security-tracker-team/security-tracker"
},
"Title": "perl-CPAN: Bypass of verification of signatures in CHECKSUMS files",
"Description": "CPAN 2.28 allows Signature Verification Bypass.",
"Severity": "HIGH",
"CweIDs": [
"CWE-347"
],
"CVSS": {
"nvd": {
"V2Vector": "AV:N/AC:M/Au:N/C:P/I:P/A:P",
"V3Vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
"V2Score": 6.8,
"V3Score": 7.8
},
"redhat": {
"V3Vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
"V3Score": 7.8
}
},
"References": [
"http://blogs.per%
╰─ kubectl -n starboard-system logs starboard-operator-866776846f-tdcg8
{"level":"error","ts":1649768062.94792,"logger":"controller.job","msg":"Reconciler error","reconciler group":"batch","reconciler kind":"Job","name":"scan-vulnerabilityreport-77444bf746","namespace":"starboard-system","error":"unexpected EOF","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"}
One of the scan-jobs is completed, but the above last block in the JSON is invalid, because it ends suddenly. And then the starboard-operator does not start other scans.
Only after deletion of the "hanging" scan-job, starboard-operator starts other scans.
I can confirm the issue that @albertschwarzkopf mentioned in https://github.com/aquasecurity/starboard/issues/1031#issuecomment-1067836429 with two different kind of errors.
I guess I can avoid the errors with changes on my side by fixing some security issues with the xdebug image (then the EOF error should disappear) and fixing the version skew with the starboard-operator and trivy images that I use (I did not notice that the public.ecr.aws/aquasecurity/starboard-operator image is not up-to-date and that I am multiple versions ahead with the trivy image from what the starboard chart uses).
Still, I wish starboard-operator would be more fault-tolerant because when scanJobsConcurrentLimit
have been completed (successfully) and they do not get removed Starboard stops scanning altogether (until I notice and delete the jobs manually).
What steps did you take and what happened:
After few days starboard operator stucks in following error:
"level":"error","ts":1647269605.345396,"logger":"controller.job","msg":"Reconciler error","reconciler group":"batch","reconciler kind":"Job","name":"scan-vulnerabilityreport-787ccf9b67","namespace":"starboard-system","error":"getting logs for pod \"starboard-system/scan-vulnerabilityreport-787ccf9b67\": getting pod controlled by job: \"starboard-system/scan-vulnerabilityreport-787ccf9b67\": pod not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"}
I can see the finished job "scan-vulnerabilityreport-787ccf9b67" in its status "Complete". But there is no pod for this job. Maybe the pod was deleted because the worker node was terminated (because we use spot instances in AWS EKS). Is it possible that such completed job are deleted after X hours,days,...? E.g. ttlSecondsAfterFinished for K8s-jobs?
Environment:
We use Starboard-Operator combined with trivy in client-server-mode. Deployed via Helm Charts:
Starboard Operator Helm-Chart-Version: 0.9.1 (app-version 0.14.1) Trivy-Server Helm-Chart-Version: 0.4.10 (app-version: 0.24.0) AWS EKS 1.21 (Bottlerocket OS and AmazonLinux 2)