anchore / grype

A vulnerability scanner for container images and filesystems
Apache License 2.0
8.49k stars 551 forks source link

Grype scan command appears to hang when downloading db or listing file #1731

Closed githala-deepak closed 1 month ago

githala-deepak commented 6 months ago

What happened: Grype command gets stuck and I get the error after 3 hours failed to load vulnerability db: unable to update vulnerability database: unable to download db: stream error: stream ID 1; INTERNAL_ERROR; received from peer What you expected to happen: Grype scan should get completed in under a minute How to reproduce it (as minimally and precisely as possible): Occurs randomly, can't reproduce Anything else we need to know?:

Environment:

willmurphyscode commented 6 months ago

Hi @githala-deepak,

Thanks for the report.

It sounds like grype is having trouble downloading its updated vulnerability DB, which it will try to do about once per day.

If you run grype db update -vvv, do you seen any errors?

If you download the db directly, with a command like this:

curl -vvv -o /tmp/db.tar.gz 'https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-02-28T01:23:28Z_ea5efb77a61bf939917f.tar.gz'

Do you see any errors? Does the download succeed? I think you probably need to troubleshoot a network issue, and that curl command will start you in the right direction.

hkadakia commented 6 months ago

I am having a similar issue.

Syft: Summary of packages by <count> <type>
00:03:13 See mediaimage.syft.json for full package details
00:03:13     122 "go-module"
00:03:13       3 "python"
00:03:13     159 "rpm"
00:03:13 
00:03:13 Grype: scanning for vulnerabilities 
00:03:13 /root/.local/bin/grype -q -o json --config=default-ignore-rules.yaml  --only-fixed  sbom:mediaimage
00:08:06 Killed
SYFT_VER=0.92.0
GRYPE_VER=0.69.1
mathrock commented 6 months ago

I have recently noticed that occasionally requests to fetch the listing.json file are super slow, like there's a bad/slow backend in rotation. I suspect the same thing is happening fetching the larger tar.gz DB sqlite files, causing the hang that users are reporting.

Additionally it seems as though there is no retry/timeout logic on the db update process, so that may also be an area to look into improving.

Are the DB files located in S3 or in an S3 bucket fronted by Cloudflare? Or just in Cloudflare R2 directly?

Some examples from earlier today if it's helpful for you to look into logs on toolbox-data.anchore.io and diagnose the issue. The initial requests to download the ~ 156KB listing.json file took over 30s!

The following requests were made around Tue, 12 Mar 2024 15:42:00 GMT

[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0  1039k      0 --:--:-- --:--:-- --:--:-- 10597

real    0m32.164s
user    0m0.060s
sys     0m0.071s

And then some requests are quick, like we're hitting a bad/slow backend in the rotation:

[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0  1039k      0 --:--:-- --:--:-- --:--:-- 1044k

real    0m0.160s
user    0m0.061s
sys     0m0.056s
[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0   939k      0 --:--:-- --:--:-- --:--:--  940k

real    0m0.177s
user    0m0.062s
sys     0m0.055s
[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0  1004k      0 --:--:-- --:--:-- --:--:-- 1011k

real    0m0.166s
user    0m0.049s
sys     0m0.070s
willmurphyscode commented 6 months ago

Thanks for the detailed info @mathrock! I've also seen grype db updates be slow, but haven't yet figured out why. We're investigating on our end.

willmurphyscode commented 6 months ago

Hi all! Thanks for reporting this.

We've changed some configs with our CDN to try to fix the issue. Since it's only intermittent, it's hard to know for sure that it's fixed, so please let us know if you continue having anymore slowness or hangs with grype database downloads.

We'll also look into putting in some timeouts in grype, since that should prevent the client from hanging regardless of the behavior of the CDN / database download.

I'll leave this issue open while we continue to monitor, and until we have client side timeouts merged.

jcote-tc commented 5 months ago

I'm having the issue today:

[0000] DEBUG checking for available database updates
23
[0000] DEBUG found database update candidate: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
24
[0000] DEBUG cannot find existing metadata, using update...
25
[0000] DEBUG database update available: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
26
[0000]  INFO downloading new vulnerability DB

It's stuck on the last line ^ : "[0000] INFO downloading new vulnerability DB"

jcote-tc commented 5 months ago

I'm having the issue today:

[0000] DEBUG checking for available database updates
23
[0000] DEBUG found database update candidate: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
24
[0000] DEBUG cannot find existing metadata, using update...
25
[0000] DEBUG database update available: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
26
[0000]  INFO downloading new vulnerability DB

It's stuck on the last line ^ : "[0000] INFO downloading new vulnerability DB"

FYI: It fixed itself after a few hours.

spiffcs commented 5 months ago

Hey everyone! Check out the latest release of grype where we now have default timeouts included (user configurable as well).

PR that was merged: https://github.com/anchore/grype/pull/1777

We're currently looking into why the CDN that hosts the listing and db files ever gets into the state where it connects, but fails to transfer the bytes.

Fajkowsky commented 4 months ago

@spiffcs Any update on why CDN is acting so slow?

willmurphyscode commented 4 months ago

Hi @Fajkowsky, can you tell us a bit about when you're seeing this slowness?

The only deterministic bit of slowness we've found is when new Grype DBs come out, there's some slowness shortly after, because all the Grype invocations shortly after the new DB is published download the new DB, but after this initial burst of traffic, a large percentage of Grype clients have the new DB cached and the download traffic is greatly reduced. We're looking at ways to put some jitter in there.

So when you see the slow downloads, is it short after 5AM UTC or so? If so, we expect this situation to improve when we introduce some jitter/staggering in when different Grype installs download the new DB.

If it's at a different time, we would really appreciate some more details if you don't mind sharing them, like what time the slow runs were at and what geographic region they're in. (Feel free to join the community slack and DM one of us if you'd rather not post that information publicly.)

Fajkowsky commented 3 months ago

Hi @willmurphyscode,

Today is the day. curl -o listing.json https://toolbox-data.anchore.io/grype/databases/listing.json

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 152k 100 152k 0 0 4974 0 0:00:31 0:00:31 --:--:-- 8563

The transfer is so low I was downloading json file with listings for 31 seconds.

willmurphyscode commented 2 months ago

We also have a similar complaint over on scan-action: https://github.com/anchore/scan-action/issues/306

willmurphyscode commented 2 months ago

Related issue at https://github.com/anchore/grype/issues/1939

willmurphyscode commented 2 months ago

Another related issue at https://github.com/anchore/grype/issues/1885

It seems like a number of users are still having CDN problems after the last round of attempted fixes. We will investigate and see what can be improved on the CDN side.

willmurphyscode commented 1 month ago

Hi all!

After some discussion on our Discourse instance we are going to try to reduce the probability that Grype checks for an updated DB by building in a delay where, if Grype's local database was built more recently than N hours ago, Grype should not check whether a new database is available, thus saving a network call. I think N will be configurable, and grype db update and grype db check will always check for a new database.

I'll post an update when this is rolled out and we'll see whether there's some improvement here. Thanks for your patience!

willmurphyscode commented 1 month ago

Hi all,

We have rolled out a change to the DB hosting infrastructure on Grype to reduce the number of bytes Grype needs to download when checking for a new database by about 95%. This change is server-side only, so you don't need to upgrade grype to benefit. We have also set up some metrics on this. So far, the fix seems to have helped. You can read more here.

Please let us know if you're still impacted by slow checks for new grype databases. If the metrics improvements hold for the next week or so, and there aren't new complaints, we'll close this issue.

Thanks for your patience on this one.

willmurphyscode commented 1 month ago

Hi all! Our metrics indicate that the reduced size of the listing file has fixed this problem. There are more details on the measurements we did on the community Discourse.

If we've missed something, please let us know on Discourse or by opening a new issue. Thanks!

sparrowt commented 1 month ago

We've been issues with this again (see also #846) e.g. this Tuesday 13th grype tried for nearly 2 hours before giving up:

[2024-08-13T15:26:43.327Z] grype -o json myimage
[2024-08-13T17:16:50.899Z] failed to load vulnerability db: unable to update vulnerability database: unable to download db: stream error: stream ID 1; INTERNAL_ERROR; received from peer

and again today I've got a current invocation which has been stuck for 2h35 and counting without any progress...

[2024-08-15T09:26:07.379Z] grype -o json myimage

Are there ongoing infrastructure issues?

kzantow commented 1 month ago

There seem to be continued issues downloading the database. See also: https://github.com/anchore/scan-action/issues/306.

As noted earlier, we believe that a change in the size of the file has solved the issues while downloading the listing, but it's not possible to shrink the size of the database in a similar manner, which is now where the failures have moved.

@sparrowt we have not been able to identify any specific issues that are within our power to fix with the current CDN hosting setup we have, unfortunately. We do have a number of options to pursue. But are you using the latest version of Grype? There should be a significantly shorter timeout than 2 hours.

vica-atlassian commented 1 month ago

I was also experiencing this issue with download of todays db not completing.

Workaround: I was able to manually download yesterdays vulnerability db and import it.

I did the following:

to obtain links to dbs: grype db list to import: grype db import <my_dl> to check status:grype db status

Hope this helps until the root issue is resolved.

jdvorak001 commented 1 month ago

Many thanks @vica-atlassian for the workaround, it saved me a lot of time. Just noting that grype skips the vulnerability db auto-update on a regular run if one has GRYPE_DB_AUTO_UPDATE=false in the environment.

Funny: I was just able to wget the yesterday's vulnerability database in 18 secs (~ 10.6 MiB/sec). At the same time my wget download of today's vulnerability database is "running" with speeds only sometimes reaching 16 KiB/s.

jerry-brimacombe-talogy commented 1 month ago

We are also having the same issue. It started yesterday morning. We had been using an old version of Grype, so updated it to the latest version. The problem seemed to be intermittent and resolved itself. However, it is now happening consistently again

Starting: Grype txt
==============================================================================
Task         : Bash
Description  : Run a Bash script on macOS, Linux, or Windows
Version      : 3.241.1
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash
==============================================================================
Generating script.
Script contents:
grype -vv sbom:/home/vsts/work/1/s/Syft/sbom.syft.json -o table --file /home/vsts/work/1/s/Grype/Grype.txt
========================== Starting Command Output ===========================
/usr/bin/bash /home/vsts/work/_temp/2f1ddfbf-89c5-49e4-87f5-f75b87924a98.sh
[0000]  INFO grype version: 0.79.6
[0000] DEBUG config:
  log:
      quiet: false
      level: debug
      file: ""
  dev:
      profile: none
  output:
      - table
  file: /home/vsts/work/1/s/Grype/Grype.txt
  distro: ""
  add-cpes-if-none: false
  output-template-file: ""
  check-for-app-update: true
  only-fixed: false
  only-notfixed: false
  ignore-states: ""
  platform: ""
  search:
      scope: squashed
      unindexed-archives: false
      indexed-archives: true
  ignore: []
  exclude: []
  db:
      cache-dir: /home/vsts/.cache/grype/db
      update-url: https://toolbox-data.anchore.io/grype/databases/listing.json
      ca-cert: ""
      auto-update: true
      validate-by-hash-on-start: false
      validate-age: true
      max-allowed-built-age: 120h0m0s
      update-available-timeout: 30s
      update-download-timeout: 5m0s
  external-sources:
      enable: false
      maven:
          search-upstream: true
          base-url: https://search.maven.org/solrsearch/select
  match:
      java:
          using-cpes: false
      dotnet:
          using-cpes: false
      golang:
          using-cpes: false
          always-use-cpe-for-stdlib: true
          allow-main-module-pseudo-version-comparison: false
      javascript:
          using-cpes: false
      python:
          using-cpes: false
      ruby:
          using-cpes: false
      rust:
          using-cpes: false
      stock:
          using-cpes: true
  fail-on-severity: ""
  registry:
      insecure-skip-tls-verify: false
      insecure-use-http: false
      auth: []
      ca-cert: ""
  show-suppressed: false
  by-cve: false
  name: ""
  default-image-pull-source: ""
  vex-documents: []
  vex-add: []
  match-upstream-kernel-headers: false
[0000] DEBUG gathering packages
[0000] DEBUG loading DB
[0000] DEBUG looking for updates on vulnerability database
[0000] DEBUG checking for available database updates
[0000] DEBUG found database update candidate: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-08-16T01:31:16Z_1723782141.tar.gz)
[0000] DEBUG cannot find existing metadata, using update...
[0000] DEBUG database update available: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-08-16T01:31:16Z_1723782141.tar.gz)
[0000]  INFO downloading new vulnerability DB
[0000]  WARN unknown package metadata type="" for packageID="e6be0d4f844469d7"
[0000]  WARN unknown package metadata type="" for packageID="11cf22f38884a9f6"
[0000]  WARN unknown package metadata type="" for packageID="4c065ad0e08c491d"
[0000]  WARN unknown package metadata type="" for packageID="b0ef2d2f58efbbf2"
[0000]  WARN unknown package metadata type="" for packageID="6d3e17d18015d4e5"
[0000]  WARN unknown package metadata type="" for packageID="4f2335411f9a94ed"
[0000]  WARN unknown package metadata type="" for packageID="6ce4c9e99bd67541"
[0000]  WARN unknown package metadata type="" for packageID="acd3097a7b0561ce"
[0000]  WARN unknown package metadata type="" for packageID="d597c9a1945c3418"
[0000]  WARN unknown package metadata type="" for packageID="c75f2825cf7e5f3d"
[0000]  WARN unknown package metadata type="" for packageID="e48a9ab765a6199a"
[0000]  WARN unknown package metadata type="" for packageID="3733bd2d1ce41916"
[0000]  WARN unknown package metadata type="" for packageID="9e2dfa7c8112b0c7"
[0000]  WARN unknown package metadata type="" for packageID="422bfa9c24bf5633"
[0000]  WARN unknown package metadata type="" for packageID="f78abd837e737350"
[0000]  WARN unknown package metadata type="" for packageID="496d62ab2bc12063"
[0000]  WARN unknown package metadata type="" for packageID="44408cae1116a7b2"
[0000]  WARN unknown package metadata type="" for packageID="82306554a557c33e"
[0000]  WARN unknown package metadata type="" for packageID="c7dc4f9a7ba95622"
[0000]  WARN unknown package metadata type="" for packageID="02c6a1a99e13c60e"
[0000]  WARN unknown package metadata type="" for packageID="5293e357cbe89b83"
[0000]  WARN unknown package metadata type="" for packageID="dc4fb70ff578156b"
[0000]  WARN unknown package metadata type="" for packageID="c2ec9d1ac081abfc"
[0000]  WARN unknown package metadata type="" for packageID="84a055a16b5290c8"
[0000]  WARN unknown package metadata type="" for packageID="fb11a92f25850efe"
[0000]  WARN unknown package metadata type="" for packageID="31ff2337a538b070"
[0000]  WARN "relationship mapping to key 5dc118795491eefb is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key 7c6b421e95d44a3f is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key a741299b9444f760 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key ad80ce239dae3cf7 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key eea3fdf9f969ce47 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "unknown relationship type: described-by" occurred 26 time(s)
[0000]  WARN "relationship mapping to key 59e0554aa6ff0ea9 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key ab5dbb2608786c3f is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key ad152d106ad5e365 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key dec5e791fb2e6143 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key e9e6568c68f68749 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000]  WARN "relationship mapping to key 3410ecaef637e6f1 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000] DEBUG no new grype update available
failed to load vulnerability db: unable to update vulnerability database: unable to download db: context deadline exceeded (Client.Timeout or context cancellation while reading body)

##[error]Bash exited with code '1'.
Finishing: Grype txt
danb-csms commented 1 month ago

Our team is also experiencing this issue with current day DB not completing the download (with previous day DB working fine).

We are looking at workarounds :(

jonathanbro commented 1 month ago

Hey Everyone. Can GRYPE_DB_AUTO_UPDATE=false be used as an input in GH actions? I don't see it listed? https://github.com/anchore/scan-action

kzantow commented 1 month ago

Hey Everyone. Can GRYPE_DB_AUTO_UPDATE=false be used as an input in GH actions? I don't see it listed? https://github.com/anchore/scan-action

@jonathanbro yes, the action supports all grype settings via GRYPE_-prefixed environment variables. For example:

      - uses: anchore/scan-action@v4
        id: grype-scan
        with:
          image: alpine:3.15
        env:
          GRYPE_CONFIG: ./my-config.yml
          GRYPE_ONLY_FIXED: true

Run: grype config to get a full list of the configuration. scan-action should be using the latest Grype.

popey commented 1 month ago

For those affected by this issue, the team deployed the changes to how the grype vulnerability database is served late last night (UK time). So, runs should now no longer exhibit the same network stalling.

Please report if you see any further issues.

jdvorak001 commented 1 month ago

Confirming it works now for me.

jerry-brimacombe-talogy commented 1 month ago

It's also working for us now. Thank you for your speedy action.

kzantow commented 1 month ago

Hi all, we've made a change to our database hosting that we believe should fix these issues, there is some more information on Discourse