aquasecurity / trivy

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more
https://aquasecurity.github.io/trivy
Apache License 2.0
23.88k stars 2.35k forks source link

Getting error while scanning image: error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal #1411

Closed saivenkateshedem closed 1 year ago

saivenkateshedem commented 3 years ago

We are using trivy client server mode .We have installed trivy in kubernetes using helm chart present in the git repo.It got installed successfully.But when iam testing it iam getting below error

error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, : failed to apply layers: layer cache missing: sha256:04cca8fe186d808482a04aa0801d04d6c3a29c3788c4488175def47091a301fc

trivy version: Version: 0.21.0 Vulnerability DB: Type: Full Version: 1 UpdatedAt: 2021-11-23 00:49:24.143965458 +0000 UTC NextUpdate: 2021-11-23 06:49:24.143965058 +0000 UTC DownloadedAt: 2021-11-23 05:47:05.714387405 +0000 UTC

saivenkateshedem commented 3 years ago

How to reproduce issue: When i use replicaCount > 1 getting the mentioned error

krol3 commented 3 years ago

@saivenkateshedem did you use the default values of the helm chart? if not, please share the values of the helm chart

saivenkateshedem commented 3 years ago

@krol3 I have modifed some of the values.Below is the values.yml

nameOverride: "" fullnameOverride: ""

image: registry: docker.io repository: aquasec/trivy tag: 0.21.0 pullPolicy: IfNotPresent pullSecret: ""

replicaCount: 2

persistence: enabled: true storageClass: "elastic" accessMode: ReadWriteOnce size: 5Gi

resources: requests: cpu: 200m memory: 512Mi limits: cpu: 1 memory: 1Gi

rbac: create: true pspEnabled: true

podSecurityContext: runAsUser: 65534 runAsNonRoot: true fsGroup: 65534

securityContext: privileged: false readOnlyRootFilesystem: true

nodeSelector: {}

affinity: {}

tolerations: []

trivy:

debugMode: false

gitHubToken: "" cache: redis: enabled: false url: ""

service:

type: ClusterIP

port: 9090

ingress: enabled: true ingressClassName: annotations: annotations

hosts:

httpProxy:

httpsProxy:

noProxy:

saivenkateshedem commented 3 years ago

@krol3 Any update in this issue

saivenkateshedem commented 3 years ago

@krol3 @knqyf263 Can you please address this issue and update here.....thanks

zestam commented 3 years ago

exact same issue here, using client server with trivy 0.21.0, had to revert :(

saivenkateshedem commented 3 years ago

@zestam with which version you are not getting the error? Iam getting this error with all versions

zestam commented 3 years ago

0.19.1 works

zestam commented 3 years ago

our installation is not via Kubernetes, the RPM is installed directly on a regular Linux server. so it might explain it

saivenkateshedem commented 3 years ago

okay...we have installed in AWS EKS..for us 0.19.1 doesn't work as it does not support node-pkg, latest version working fine with replicaCount: 1. If we give replicaCount > 1 , throwing error

zestam commented 3 years ago

our exact errors were: 2021-11-25T12:47:37.421Z FATAL error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, some-image:latest: failed to detect vulnerabilities: failed to scan application libraries: failed vulnerability detection of libraries: failed to new driver: unsupported type node-pkg

2021-11-25T13:04:31.359Z FATAL error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, some-image2:latest: failed to detect vulnerabilities: failed to scan application libraries: failed vulnerability detection of libraries: failed to new driver: unsupported type python-pkg

saivenkateshedem commented 3 years ago

@zestam This is not related to this issue.For you issue you can try latest version.It will work

zestam commented 3 years ago

latest version worked for simple base images, not for complex images. i'll open a separate issue. thanks.

krol3 commented 2 years ago

@saivenkateshedem I will test it and go back here

krol3 commented 2 years ago

false

I tested with replicaCount: 2, I could install successfully the helm chart. I used the 0.21.2 version

saivenkateshedem commented 2 years ago

@krol3 Installation is successful for me too..while i was testing with trivy command iam getting the error command: trivy client --remote trivy-domain-name ${DOCKER_IMAGE} > ${DOCKER_IMAGE}.txt

krol3 commented 2 years ago

@saivenkateshedem I tested the helm chart version locally using the latest version. I had this results

trivy client --remote http://localhost:9090 alpine:3.4
2021-12-14T21:47:12.966-0300    WARN    This OS version is no longer supported by the distribution: alpine 3.4.6
2021-12-14T21:47:12.966-0300    WARN    The vulnerability detection may be insufficient because security updates are not provided

alpine:3.4 (alpine 3.4.6)
=========================
Total: 12 (UNKNOWN: 0, LOW: 0, MEDIUM: 10, HIGH: 2, CRITICAL: 0)

+--------------+------------------+----------+-------------------+---------------+--------------------------------------+
|   LIBRARY    | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION |                TITLE                 |
+--------------+------------------+----------+-------------------+---------------+--------------------------------------+
| libcrypto1.0 | CVE-2018-0732    | HIGH     | 1.0.2n-r0         | 1.0.2o-r1     | openssl: Malicious server            |
saivenkateshedem commented 2 years ago

@krol3 I have deployed the trivy helm chart in aws eks with the replica count 2 and with the latest version 0.21.2. Iam getting the error.

trivy client --remote remote-url image-name 2021-12-16T04:02:28.110Z FATAL error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, acko-home-dev:dev-39: failed to apply layers: layer cache missing: sha256:04cca8fe186d808482a04aa0801d04d6c3a29c3788c4488175def47091a301fc

I have deployed with the following command helm upgrade --install --wait trivy-dev -n vulnerability-scanning -f trivy-values.yml aquasecurity/trivy

trivy-values.yml is the one which i mentioned above

Is there any configuration iam missing..?

Note:

  1. I have tested alpine:3.4 image with the same configuration,it is working fine.But with ourapplication image iam getting error.
  2. Size of our application image is 501MB where as size of the apline is 4MB.
krol3 commented 2 years ago

@saivenkateshedem It seems that it's a particular case related to the size of your application. Right?

github-actions[bot] commented 2 years ago

This issue is stale because it has been labeled with inactivity.

sanferjesus commented 2 years ago

I have the same issue, I have deployed the trivy in server mode from helm chart in k8s cluster with the replica count 2 and with the latest version 0.26.0. I'm getting the error:

/usr/local/bin/trivy fs --server https://trivy-server.example.com / 2022-04-19T18:49:49.101Z FATAL scan error: image scan failed: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, kube-01: failed to apply layers: layer cache missing: sha256:06f132322222362ef689c66fb906e97f7037b7e5149b37b9e28064a57e3f98f3

If we change the replica count to 1, everything work fine!

any suggestion @krol3?

knqyf263 commented 2 years ago

We'd recommend using Redis as a cache backend. https://github.com/aquasecurity/trivy/tree/v0.26.0/helm/trivy#caching

@krol3 I think we should document it. Trivy server cannot be scaled out due to BoltDB.

adapasuresh commented 2 years ago

sureshadapa@localhost GitHub-Repos % trivy client --remote https://pluraldev.pinepg.in openjdk:11-jre-slim
2022-06-17T18:49:12.860+0530 FATAL error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, openjdk:11-jre-slim: failed to apply layers: layer cache missing: sha256:85753a1a8113e0a5850fed9a4acd008b554a44b53c61a5f2a3e6bd93812890a9

sureshadapa@localhost GitHub-Repos % trivy client --remote https://pluraldev.pinepg.in alpine:3.10
2022-06-17T18:55:55.815+0530 WARN This OS version is no longer supported by the distribution: alpine 3.10.9 2022-06-17T18:55:55.816+0530 WARN The vulnerability detection may be insufficient because security updates are not provided

alpine:3.10 (alpine 3.10.9)

Total: 1 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 1)

+-----------+------------------+----------+-------------------+---------------+---------------------------------------+ | LIBRARY | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION | TITLE | +-----------+------------------+----------+-------------------+---------------+---------------------------------------+ | apk-tools | CVE-2021-36159 | CRITICAL | 2.10.6-r0 | 2.10.7-r0 | libfetch before 2021-07-26, as | | | | | | | used in apk-tools, xbps, and | | | | | | | other products, mishandles... | | | | | | | -->avd.aquasec.com/nvd/cve-2021-36159 | +-----------+------------------+----------+-------------------+---------------+---------------------------------------+

MuppyCwa commented 2 years ago

Got this error just recently using version 0.29.2 when changed to use client/server instead of stand alone:

FATAL image scan error: scan error: image scan failed: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, our.registry.se:5001/my-application:latest: failed to apply layers: layer cache missing: sha256:1e2f2b457b1c67611f1792cee094c289997327baa020c9d82fb6031b1389950a

Command was:

docker run --rm ${trivyMounts} our.registry.se:5002/trivy:master -q image --server http://${env.TRIVY_SERVER_URL} --format json -o /home/Jenkins/logs/${stackName}_report.json --ignore-unfixed --timeout 60m0s --no-progress --security-checks vuln ${registry}/${stackName}:${tag}

But when testing manually there was no error, which made me believe there might be a concurrency issue. After I removed any parallel scans in our Jenkins pipelines, this issue was not seen anymore.

adapasuresh commented 2 years ago

The believe the triviy recommended image size as on today to 200MB, any thing beyond that it just timeout. I read trivy is working on giving control on the size of Image, may be --ImageSize 399MB in next version of trivy client. waiting for new version.

erikgb commented 2 years ago

Trivy server cannot be scaled out due to BoltDB

@knqyf263 Can you please elaborate on this statement about scaling out? I am trying to set up at Trivy server with HA. Isn't that possible?

egeland commented 2 years ago

Also seeing this issue - both client and server are on trivy 0.33.0 (I'll try upgrading to 0.34.0 shortly):

$ trivy fs --server https://scanner.example.com/ --security-checks vuln --exit-code 0 .
2022-11-16T02:23:09.556Z    INFO    Vulnerability scanning is enabled
2022-11-16T02:26:16.938Z    FATAL   filesystem scan error: scan error: scan failed: scan failed: failed to detect vulnerabilities via RPC: twirp error internal: failed scan, .: failed to apply layers: layer cache missing: sha256:2d610d146f0b979e0e0ef2f601075642a4d0f9ff6b665863bba69[31](https://gitlab.example.com/example/repo/-/jobs/1788136#L31)a0c988c73
egeland commented 2 years ago

v0.34.0 does not fix...

On more attentive reading of the comments above, I've set the Trivy k8s Deployment to be a single pod, not autoscale based on load. Hopefully, that fixes the issue. Would be good to be able to scale somehow, though...

doagl commented 2 years ago

FYI @egeland: i also had this issue with version 0.35.0 and it worked after I removed the trailing "/" from the trivy server URL.

erikgb commented 1 year ago

We see this all the time in our operator e2e-tests, and it's running just a single replica of Trivy server. Here's an example run: https://github.com/statnett/image-scanner-operator/actions/runs/4234368855/jobs/7356565151

I have not been able to reproduce this locally. Knowing the resource limitations on GitHub hosted runners, I suspect the problem to be with how Trivy operates with limited resources.....

erikgb commented 1 year ago

We are still struggling massively with this on GitHub hosted runners (with limited resources). Would it be possible to improve the error handling and/or add some more debug logging? From a client's point of view, the Trivy server is badly misbehaving....

I was able to capture a stacktrace from the Trivy server log:

  github.com/aquasecurity/trivy/pkg/rpc/server.(*ScanServer).Scan
      /home/runner/work/trivy/trivy/pkg/rpc/server/server.go:56
- failed to apply layers:
  github.com/aquasecurity/trivy/pkg/scanner/local.Scanner.Scan
      /home/runner/work/trivy/trivy/pkg/scanner/local/scan.go:102
- layer cache missing: sha256:bf4cf07ec437e7f61fc2a6c30a0800c6f3f0e078109c58449fd240821318a582:
  github.com/aquasecurity/trivy/pkg/fanal/applier.Applier.ApplyLayers
      /home/runner/work/trivy/trivy/pkg/fanal/applier/applier.go:24

Could it be a concurrency issue? We observe this in our e2e-test, where the Trivy server is hit by multiple kuttl-based tests running in parallel. I have also tried to reduce parallelity to 1 (serial) without much luck, but I suspect the interaction between the Trivy client and server to perform multiple requests per scan. Here is a recent failing run on a microk8s cluster: https://github.com/statnett/image-scanner-operator/actions/runs/4274433310/jobs/7441051646. We are using k3s on the main branch, but I wanted to check if this problem was Kubernetes distro dependent, but apparently not.

We are running a simple single replica setup: https://github.com/statnett/image-scanner-operator/tree/main/config/trivy-server, so I find this comment in your FAQ about multiple servers potentially a bit misleading. We try to avoid the Redis cache backend as there seems to be no way to get a full HA setup (Trivy does not yet support Redis Sentinel, ref. https://github.com/aquasecurity/trivy/issues/1115), so running a single Trivy server with file cache instead to keep things simple.

adapasuresh commented 1 year ago

How about trying below solution

  1. Trivy updates it's "DB" every 12 hours
  2. take periodic (in between these 12 hours) backup of container to tar.gz (ex docker save) and store them in fs
  3. K8S micro services (many instances as required) Trivy to scan the latest files in fs to produce VAS reports
  4. I feel running scan on fs is quicker them direct "image scan", where I had challenges other than alpine image scan I.e above certain size

/SureshA

Get Outlook for iOShttps://aka.ms/o0ukef


From: Erik Godding Boye @.> Sent: Tuesday, February 21, 2023 9:50:30 PM To: aquasecurity/trivy @.> Cc: Suresh Adapa @.>; Comment @.> Subject: Re: [aquasecurity/trivy] Getting error while scanning image: error in image scan: scan failed: failed to detect vulnerabilities via RPC: twirp error internal (Issue #1411)

CAUTION: EXTERNAL EMAIL: DO NOT open attachments or click on links from unknown senders or unexpected emails.


We see this all the time in our operator e2e-tests, and it's running just a single replica of Trivy server. Here's an example run: https://github.com/statnett/image-scanner-operator/actions/runs/4234368855/jobs/7356565151

I have not been able to reproduce this locally. Knowing the resource limitations on GitHub hosted runners, I suspect the problem to be with how Trivy operates with limited resources.....

— Reply to this email directly, view it on GitHubhttps://github.com/aquasecurity/trivy/issues/1411#issuecomment-1438757018, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXMI63MKKSRWUKKVCEJQ6STWYTTM5ANCNFSM5ISW25LA. You are receiving this because you commented.Message ID: @.***>

CONFIDENTIALITY NOTICE This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged and confidential. If you are not the intended recipient, you are hereby notified that any disclosure, distribution or other use of this e-mail message or attachments is strictly prohibited. If you have received this e-mail message in error, please permanently delete the email and its attachments and notify the sender immediately. Thank you.

github-actions[bot] commented 1 year ago

This issue is stale because it has been labeled with inactivity.