anchore / scan-action

Anchore container analysis and scan provided as a GitHub Action
MIT License
201 stars 75 forks source link

Read Connection Timeout - Downloading Grype DB #306

Open saisatishkarra opened 4 months ago

saisatishkarra commented 4 months ago

Issue

Recently in our CI, we have been experiencing grype db tcp read timeouts while downloading the db as part of using the action. This is leading to delayed and failed CVE scanning / additional time for the build pipelines to complete.

Screenshot 2024-04-23 at 2 07 58 PM

Version

Grype version: v0.74.4 Action Version: anchore/scan-action@v3.6.4

Observation

Expectation

  1. What is the default behavior when the GRYPE_DB_AUTO_UPDATE: false is set ? Does the action fail or run on first and subsequent invocations assuming no other DB is imported manually? (Eg: When invoked multiple times within the same pipeline job?) - Testing it seems it did fail (Refer screenshots in below comment)
  2. Can the action be enhanced to always check DB status and only download latest DB even for a specific case where GRYPE_DB_AUTO_UPDATE: false && DB_STATUS=invalid for first invocation of action within a single job?
  3. Are there any other recommendations to avoid the timeout issue / delayed scanning time? (Eg: How to increase / override the db.update-download-timeout parameter in config across multiple repos using a shared workflow of this action?)
saisatishkarra commented 4 months ago
Screenshot 2024-04-23 at 4 30 05 PM Screenshot 2024-04-23 at 4 30 38 PM
kzantow commented 4 months ago

I think you're confusing two options, @saisatishkarra. The scan-action does download a grype database each time it's run. It has GRYPE_CHECK_FOR_APP_UPDATE set to false, so it doesn't check to see if there is a new version of Grype itself.

We have had some reports of the database (and listing file) downloads being flaky over the past few weeks. These are hosted on CDN, outside of our control for the most part. We have been able to sporadically reproduce problems and have provided as much information as we can to the CDN provider, but haven't been able to identify what the issue is nor have we been able to get any resolution.

saisatishkarra commented 4 months ago

My concern is mostly around the flaky CDN downloads for the DB update every time the scan-action is run. Is there a public CDN metrics status page for the grype db downloads to monitor / subscribe?

I am also interested on how to maintain db in a offline environment and specify it as an input for the action to import it without having to pull from the online network for every run? Any pointers to use and scale this offline approach across multiple repository pipelines is appreciated.

willmurphyscode commented 1 month ago

Hi @saisatishkarra there isn't currently a metrics page, but that request is a good idea. We've had a couple other complaints about the CDN, so I'll re-open this. Thanks for your patience while we try to figure out what to do about CDN slowness/flakiness.

aebrahim commented 2 weeks ago

Can this database file be hosted on a more reliable CDN? We use Google Cloud CDN over a Cloud Storage bucket and have had no issues. I am happy to help with configuration of this if needed.

kzantow commented 2 weeks ago

@aebrahim do you continue to have issues downloading the database?

We have made one change: to make the listing file significantly smaller, which appears to have solved most of the issues when Grype checks to see if the database needs an update. But we haven't been able to solve the issues downloading the database itself just yet with the current CDN setup.

A challenge with using this action is that it needs to download the database every time it runs since there is no persistent storage on the runners by default. I have a change we might be able to pursue, which could cache the database in actions cache, which, depending on your usage might be something that could help make this more reliable.

How many times daily would you estimate this action gets run for you?

aebrahim commented 2 weeks ago

Yes. We use grype in 2 ways, this GItHub action and also running the container in Google Cloud Build. In both cases, we have observed issues downloading the database. In both cases, the outages will be periodic. On most days, we will have no failures, and on some days we will have near 100% failures for most of the day.

A GitHub actions cache would be massively helpful for this action for sure - and it'll probably take a lot of the load off of your poor CDN.

aebrahim commented 2 weeks ago

Another trick to take load off of your CDN costs is to also host public requester-pays buckets on AWS and GCP and have options to fetch from those instead of the CDN.

kzantow commented 1 week ago

Hey @aebrahim we've made a change to the database hosting, since the issues downloading appeared to become more widespread. Have things been more stable for you? Do you still think further changes are needed to improve speed or stability?

aebrahim commented 3 days ago

Thanks for following up - it has been more stable for the past week.