aquasecurity / trivy-operator

Kubernetes-native security toolkit
https://aquasecurity.github.io/trivy-operator/latest
Apache License 2.0
1.28k stars 212 forks source link

Trivy-operator has stopped exposing `image_digest` label in metric `trivy_image_vulnerabilities` #1763

Open d-mankowski-synerise opened 10 months ago

d-mankowski-synerise commented 10 months ago

What steps did you take and what happened:

After upgrading trivy-operator to the latest version (0.18.1, chart version: 0.20.1) from 0.16.4, label image_digest is missing in metric trivy_image_vulnerabilities (deployment of a new version was around midnight):

image

This is problematic, because we can have, for example, two images alpine:latest, and one can be a year old, while the other - a recent one. And this makes dashboards regarding CVEs in Grafana difficult to maintain, since there is no convenient way to group images by some label.

I haven't seen this change mentioned anywhere in the changelog, hence this should be considered as a bug.

The problem, I think, is caused by the lack of digest field in vulnerabilityreports. For example, report created by operator 0.16.4:

report:
  artifact:
    digest: sha256:be96fcdb31b3d212dd326fdea33c3577d9a1f4243f428ecf181bcf2a065e32b6
    repository: bitnami/mongodb
    tag: 5.0.9-debian-10-r0
  registry:
    server: index.docker.io

while one created by 0.18.1:

report:
  artifact:
    repository: bitnami/redis
    tag: 6.2.7-debian-10-r23
  registry:
    server: index.docker.io

What did you expect to happen:

trivy_image_vulnerabilities exposes image_digest label

Anything else you would like to add:

I haven't made any changes to the config when upgrading:

---
targetWorkloads: "replicaset,replicationcontroller,statefulset,daemonset,cronjob,pod"

trivy:
  image:
    tag: 0.48.2
  mode: ClientServer
  serverURL: http://trivy-service.trivy-operator:4954
  serverInsecure: true
  ignoreUnfixed: true
  additionalVulnerabilityReportFields: "Target,Class"
  timeout: "5m0s"
  slow: false
  # Resources of scan-vulnerabilityreport pods
  resources:
    requests:
      cpu: 350m
      memory: 1024Mi
    limits:
      cpu: 600m
      memory: 2048Mi

trivyOperator:
  scanJobCompressLogs: true
  reportResourceLabels: app,synerise.com/owner.team

# trivy-operator pod resources.
resources:
  requests:
    cpu: 500m
    memory: 3072Mi
  limits:
    cpu: 800m
    memory: 5120Mi

operator:
  sbomGenerationEnabled: true
  scanJobTTL: 10m
  scanJobsConcurrentLimit: 5
  rbacAssessmentScannerEnabled: false
  infraAssessmentScannerEnabled: false
  builtInTrivyServer: true
  metricsFindingsEnabled: true
  metricsVulnIdEnabled: false
  metricsExposedSecretInfo: true
  metricsConfigAuditInfo: true
  metricsRbacAssessmentInfo: false
  metricsInfraAssessmentInfo: false
  exposedSecretScannerEnabled: false
  webhookBroadcastURL: "http://postee.trivy-operator:8082"
  logDevMode: false

serviceMonitor:
  enabled: true
  namespace: trivy-operator
  interval: 1m

Environment:

chen-keinan commented 10 months ago

@d-mankowski-synerise nothing has been changed in this area I'll have a look anyway to double check

d-mankowski-synerise commented 10 months ago

@chen-keinan IMO the problem is not related to metrics, but to creation of vulnerability reports - the digest field is missing in reports created by operator 0.18.1

d-mankowski-synerise commented 10 months ago

It looks like there are still some reports that contain digest field - weird. But there is two times more metrics that have no image_digest compared to the metrics that do have image_digest label

image image
chen-keinan commented 10 months ago

@d-mankowski-synerise I do not think that digest info is always available for trivy. this is how digest info is set

d-mankowski-synerise commented 10 months ago

This wasn't the case with operator 0.16.4 - as you can see above, before the upgrade we didn't have a single case of trivy_image_vulnerabilities without image_digest ({image_digest=""}) label

chen-keinan commented 10 months ago

This wasn't the case with operator 0.16.4 - as you can see above, before the upgrade we didn't have a single case of trivy_image_vulnerabilities without image_digest ({image_digest=""}) label

wired its look like the logic is the same for 0.16.4

LesSyner commented 10 months ago

It was ok in 0.16.4 and every version above had the same issue, which me and @d-mankowski-synerise decided to nail down now with 0.18.1 version. So my guess it that one of commits for 0.17.0 version probably by some mistake broke logic for digests.

d-mankowski-synerise commented 10 months ago

I will rollback to 0.16.4 with the same Trivy version (0.48.2) to make sure it is not related to Trivy itself. With 0.16.4 we used 0.48.0, so I am doubtful it is the cause, but it won't hurt to exclude the possibility.

chen-keinan commented 10 months ago

@d-mankowski-synerise do you have a specific public image which produce image_digest with v0.16.4 and do not produce the same metric with v0.18.1 which I can test with ?

d-mankowski-synerise commented 10 months ago

@chen-keinan

this query: group by (image_registry, image_repository, image_tag, image_digest) (trivy_image_vulnerabilities{image_registry="ghcr.io"})

returns the following:

{image_registry="ghcr.io", image_repository="external-secrets/external-secrets", image_tag="v0.8.3"}
1
{image_registry="ghcr.io", image_repository="aquasecurity/trivy-operator", image_tag="0.18.1", image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775"}
1
{image_registry="ghcr.io", image_repository="aquasecurity/trivy-operator", image_tag="0.18.1"}
1

which gets even weirder - metrics regarding vulnerabilities of image ghcr.io/aquasecurity/trivy-operator/0.18.1 are exposed twice, one time with image_digest, one time without

chen-keinan commented 10 months ago

@chen-keinan

this query: group by (image_registry, image_repository, image_tag, image_digest) (trivy_image_vulnerabilities{image_registry="ghcr.io"})

returns the following:

{image_registry="ghcr.io", image_repository="external-secrets/external-secrets", image_tag="v0.8.3"}
1
{image_registry="ghcr.io", image_repository="aquasecurity/trivy-operator", image_tag="0.18.1", image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775"}
1
{image_registry="ghcr.io", image_repository="aquasecurity/trivy-operator", image_tag="0.18.1"}
1

which gets even weirder - metrics regarding vulnerabilities of image ghcr.io/aquasecurity/trivy-operator/0.18.1 are exposed twice, one time with image_digest, one time without

maybe there is a bug where one metric override the other?

LesSyner commented 10 months ago

IMO 2 candidates for introduction of such bug would be 2 changes in 0.17.0 related to metrics - addition of OS Info metrics and addition of clusterCompliance_info metrics.

chen-keinan commented 10 months ago

@d-mankowski-synerise @LesSyner thanks I'll try to reproduce it and fix it

chen-keinan commented 10 months ago

@d-mankowski-synerise are you sure its duplicate metric , should be on metric for each severity , example:

# HELP trivy_image_vulnerabilities Number of container image vulnerabilities
# TYPE trivy_image_vulnerabilities gauge
trivy_image_vulnerabilities{container_name="coredns",image_digest="sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc",image_registry="index.docker.io",image_repository="rancher/mirrored-coredns-coredns",image_tag="1.10.1",name="replicaset-coredns-576cfbb478-coredns",namespace="kube-system",resource_kind="ReplicaSet",resource_name="coredns-576cfbb478",severity="Critical"} 0
trivy_image_vulnerabilities{container_name="coredns",image_digest="sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc",image_registry="index.docker.io",image_repository="rancher/mirrored-coredns-coredns",image_tag="1.10.1",name="replicaset-coredns-576cfbb478-coredns",namespace="kube-system",resource_kind="ReplicaSet",resource_name="coredns-576cfbb478",severity="High"} 3
trivy_image_vulnerabilities{container_name="coredns",image_digest="sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc",image_registry="index.docker.io",image_repository="rancher/mirrored-coredns-coredns",image_tag="1.10.1",name="replicaset-coredns-576cfbb478-coredns",namespace="kube-system",resource_kind="ReplicaSet",resource_name="coredns-576cfbb478",severity="Low"} 0
trivy_image_vulnerabilities{container_name="coredns",image_digest="sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc",image_registry="index.docker.io",image_repository="rancher/mirrored-coredns-coredns",image_tag="1.10.1",name="replicaset-coredns-576cfbb478-coredns",namespace="kube-system",resource_kind="ReplicaSet",resource_name="coredns-576cfbb478",severity="Medium"} 4
trivy_image_vulnerabilities{container_name="coredns",image_digest="sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc",image_registry="index.docker.io",image_repository="rancher/mirrored-coredns-coredns",image_tag="1.10.1",name="replicaset-coredns-576cfbb478-coredns",namespace="kube-system",resource_kind="ReplicaSet",resource_name="coredns-576cfbb478",severity="Unknown"} 0

can you please share the full metric for trivy-operator ? btw: above is produced with trivy-operator v0.18.1

d-mankowski-synerise commented 10 months ago

@chen-keinan yup, I am sure:

> curl -s localhost:8080/metrics | grep 'aquasecurity/trivy-operator'
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Critical"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="High"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Low"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Medium"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Unknown"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Critical"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="High"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Low"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Medium"} 2
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Unknown"} 0

where localhost is port-forwarded trivy-operator's pod

chen-keinan commented 10 months ago

@chen-keinan yup, I am sure:

> curl -s localhost:8080/metrics | grep 'aquasecurity/trivy-operator'
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Critical"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="High"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Low"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Medium"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Unknown"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Critical"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="High"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Low"} 0
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Medium"} 2
trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Unknown"} 0

where localhost is port-forwarded trivy-operator's pod

I see you are using the labels k8s_label_* , I'll debug it

d-mankowski-synerise commented 10 months ago

yup

trivyOperator:
  scanJobCompressLogs: true
  reportResourceLabels: app,synerise.com/owner.team
d-mankowski-synerise commented 10 months ago

I am confused. The number of metrics without digest had started to go down, but recently - it has started to go up again:

image

And the number of metrics with digest has also started to go up:

image

we didn't change any operator settings (only bumped resources, since we noticed some throttling), TTL is set to 24h.

After checking trivy-operator logs, I noticed that this error is printed quite often:

❯ kubectl logs trivy-operator-566589c494-hjpjs | grep 'unable to get missing layers' -c
39
{
    "level": "error",
    "ts": "2024-01-16T21:17:25Z",
    "logger": "reconciler.scan job",
    "msg": "Scan job container",
    "job": "trivy-operator/scan-vulnerabilityreport-68f456865b",
    "container": "init",
    "status.reason": "Error",
    "status.message": "2024-01-16T21:17:16.599Z\t\u001b[31mFATAL\u001b[0m\timage scan error: scan error: scan failed: failed analysis: unable to get missing layers: unable to fetch missing layers: twirp error internal: failed to do request: Post \"http://trivy-service.trivy-operator:4954/twirp/trivy.cache.v1.Cache/MissingBlobs\": dial tcp 10.244.244.70:4954: connect: connection refused\n",
    "stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport/controller.(*ScanJobController).processFailedScanJob\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller/scanjob.go:346\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport/controller.(*ScanJobController).SetupWithManager.(*ScanJobController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller/scanjob.go:81\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/reconcile/reconcile.go:111\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"
}

but this seems to be related to Trivy itself, not the operator?

chen-keinan commented 9 months ago

@d-mankowski-synerise I do not this the error you mention related to missing digest.

chen-keinan commented 9 months ago

@d-mankowski-synerise looking again at the example of duplicate metric you put above, if you take a look at the resource name you'll see its has a different name meaning its not the same resource, could be that the data is coming from an old report before upgrade:

trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-564c8d89bd-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-564c8d89bd",severity="Medium"} 0 compare to :

trivy_image_vulnerabilities{container_name="trivy-operator",image_digest="sha256:19633ccb72c369e90d22e38eddd86fbc8f43851cee68c9d7d6acadd5cc053775",image_registry="ghcr.io",image_repository="aquasecurity/trivy-operator",image_tag="0.18.1",k8s_label_app="",k8s_label_synerise_com_owner_team="",name="replicaset-trivy-operator-675cf74d45-trivy-operator",namespace="trivy-operator",resource_kind="ReplicaSet",resource_name="trivy-operator-675cf74d45",severity="Medium"} 2

let me know wdyt