Open HenrikDK opened 1 month ago
Thanks for the report, we will look into it.
I also saw this right now :/ Any ideas why?
I believe this is currently causing problems with anyone using the trivy action. We have had to turn it off on some workflows. I'm not sure what the long term solution might be - if GH cannot increase the global rate limit for the artifact pull then maybe it needs to be in a public AWS S3 bucket or something similar?
From My PR above, a workaround suggested by someone else:
- uses: aquasecurity/trivy-action@0.24.0
with:
...
env:
TRIVY_DB_REPOSITORY: <something else than ghcr.io>
TRIVY_JAVA_DB_REPOSITORY: <something else than ghcr.io>```
Does anyone know how to get trivy-action to auth with a privately hosted trivy-db repo? I can get it working fine with normal trivy on local, but trivy-action does not work with either docker/login-action
or the usual echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
`2024-09-20T16:39:01Z FATAL Fatal error init error: DB error: failed to download vulnerability DB: database download error: OCI repository error: 1 error occurred:
I was able to get it to work with ECR only using an OIDC login via the configure-aws-credentials action used right before the trivy action. It is not using docker to pull the artifact as it is not a docker image.
I am poor student
I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:
- name: Run Trivy scan on image
uses: aquasecurity/trivy-action@0.24.0
with:
[... your config ...]
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:
- name: Run Trivy scan on image uses: aquasecurity/trivy-action@0.24.0 with: [... your config ...] env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I've tried logging in to GHCR via docker/login-action before running Trivy CLI (not action), and I am still getting lots of 429 errors.
From My PR above, a workaround suggested by someone else:
- uses: aquasecurity/trivy-action@0.24.0 with: ... env: TRIVY_DB_REPOSITORY: <something else than ghcr.io> TRIVY_JAVA_DB_REPOSITORY: <something else than ghcr.io>```
So, if I understand this correctly:
I, as the consumer of this action, must download copies of these DBs and store them on my own registry. Then, I must pass environment variables to the action which point at my copies of the DBs. Is that correct?
How often are these DBs updated?
@nnellanspdl think its at 00:00 every day? but im not sure.
But anyway this workaround is a hustle to host them self if u need to update them every day
I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:
- name: Run Trivy scan on image uses: aquasecurity/trivy-action@0.24.0 with: [... your config ...] env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I've tried logging in to GHCR via docker/login-action before running Trivy CLI (not action), and I am still getting lots of 429 errors.
Same for me, it doesn't seem to have significant effects.
I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:
- name: Run Trivy scan on image uses: aquasecurity/trivy-action@0.24.0 with: [... your config ...] env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I'm trying with:
env:
ACTIONS_RUNTIME_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I spawned multiple parallel ci/cd actions, and this seems more reliable.
If anyone is going the route of uploading the Trivy DB to their own registry, I've had success using https://github.com/oras-project/setup-oras
Something like:
vendor-trivy-db:
runs-on: ubuntu-latest
steps:
- name: Vendor latest trivy db
uses: oras-project/setup-oras@v1
- run: |
oras pull ghcr.io/aquasecurity/trivy-db:2
oras login -u ${{ secrets.REGISTRY_USERNAME }} -p ${{ secrets.REGISTRY_TOKEN }} YOUR_REGISTRY
oras push YOUR_REGISTRY \
db.tar.gz:application/vnd.aquasec.trivy.db.layer.v1.tar+gzip \
--artifact-type application/vnd.aquasec.trivy.config.v1+json
I setup AWS ECR pull-throuhg cache for trivy-db and trivy-java-db , modified action:
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.24.0
with:
image-ref: ${{ env.DOCKER_IMAGE_TO_SCAN }}
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'
env:
TRIVY_DB_REPOSITORY: <ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db
TRIVY_JAVA_DB_REPOSITORY: <ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-java-db
TRIVY_DEBUG: true
but pulling of trivy-db fails with:
2024-09-23T16:16:12Z INFO Downloading DB... repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db"
2024-09-23T16:16:12Z DEBUG No metadata file
2024-09-23T16:16:17Z DEBUG Credential error err="failed to get authorization token: operation error ECR: GetAuthorizationToken, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded"
2024-09-23T16:16:17Z FATAL Fatal error init error: DB error: failed to download vulnerability DB: database download error: OCI repository error: 1 error occurred:
* GET https://<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/v2/github/ghcr.io/aquasecurity/trivy-db/manifests/2: unexpected status code 401 Unauthorized: Not Authorized
Docker is logged-in. If I run trivy binary locally or on runner, it works fine:
runner@runner-set-xs-djfqb-0:/tmp$ export TRIVY_DB_REPOSITORY=<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db
runner@runner-set-xs-djfqb-0:/tmp$ export TRIVY_JAVA_DB_REPOSITORY=<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-java-db
runner@runner-set-xs-djfqb-0:/tmp$ trivy image --format table --exit-code 1 --ignore-unfixed --vuln-type os,library --severity CRITICAL,HIGH <ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/my-awesome-app:1.23.0
2024-09-23T16:35:04Z WARN '--vuln-type' is deprecated. Use '--pkg-types' instead.
2024-09-23T16:35:04Z INFO Adding schema version to the DB repository for backward compatibility repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db:2"
2024-09-23T16:35:04Z INFO Adding schema version to the Java DB repository for backward compatibility repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-java-db:1"
2024-09-23T16:35:04Z INFO [db] Need to update DB
2024-09-23T16:35:04Z INFO [db] Downloading DB... repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db:2"
53.56 MiB / 53.56 MiB [------------------------------------------------------------------------------------------------------
Has somebody tried to pull trivy-db from AWS ECR using action?
Yes so you can pull from ECR pull through but only if you do an OIDC set-aws-credentials action first before the trivy action. Im not sure why yet that you cannot use anything but OIDC, or at least I can't seem to get regular role assumption to work. Docker login doesnt help you as the container doesnt try to pull the DB using docker commands.
If you try a docker pull you will get the unsupported media type error as the above post, as the artifact isnt an 'image'
Ah, thanks. I was logged in under the incorrect account when I posted originally. That's what I was wondering, @billhammond-dev !
This was my error, for anyone else who runs into it:
latest: Pulling from github/ghcr.io/aquasecurity/trivy-db
unsupported media type application/vnd.aquasec.trivy.config.v1+json
@nnellanspdl think its at 00:00 every day? but im not sure.
But anyway this workaround is a hustle to host them self if u need to update them every day
Thanks. Yes, this is a lot to ask of consumers of your action.
I'm guessing it would be too much work to update the logic for pulling the file to allow passing it the file directly? We could setup a workflow to pull and stash the image every X hours, and then in the workflow that uses the image, we pull the file from the stash to use. It'd lower the amount of hits by users, and we wouldn't need to host it in AWS and pay
ACTIONS_RUNTIME_TOKEN
@NicholasFiorentini that's interesting, would you mind creating a PR to document this in the repo? If possible, could you also reference where this environment variable is documented?
FWIW, here's a sample snippet for using AWS ECR pull through cache repositories using OIDC for AWS auth.
Pull through cache ECR repositories (for hosting the cached trivy DB artifacts) must be configured prior to running this workflow, see documentation.
- name: Setup AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: ...
role-to-assume: <role, assumable through OIDC, that can pull from the cache ECR repositories>
- id: ecr-login
name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
...
- name: Run trivy scan
uses: aquasecurity/trivy-action@0.24.0
with:
...
env:
TRIVY_DB_REPOSITORY: ${{ steps.ecr-login.outputs.registry }}/github/aquasecurity/trivy-db:2
TRIVY_JAVA_DB_REPOSITORY: ${{ steps.ecr-login.outputs.registry }}/github/aquasecurity/trivy-java-db:1
Per AWS documentation:
When a cached image is pulled through the Amazon ECR private registry URI, Amazon ECR checks the upstream repository at least once every 24 hours to verify whether the cached image is the latest version. If there is a newer image in the upstream registry, Amazon ECR attempts to update the cached image. This timer is based off the last pull of the cached image.
ACTIONS_RUNTIME_TOKEN
@NicholasFiorentini that's interesting, would you mind creating a PR to document this in the repo? If possible, could you also reference where this environment variable is documented?
I found this variable by inspecting the Docker command executed by the Trivy action. I deployed my changes yesterday and all my repos haven't seen the rate-limiting error so far.
It looks like an undocumented GitHub action feature: https://github.com/search?q=repo%3Aactions%2Ftoolkit%20ACTIONS_RUNTIME_TOKEN&type=code
Same here, happening more and more often, causing to have to manually run again jobs :(
Running trivy with options: trivy image --format table --exit-code 1 --ignore-unfixed --vuln-type os,library --scanners vuln --severity CRITICAL,HIGH --timeout 10m0s docker.io/oxsecurity/megalinter-only-api_spectral:pr-[40](https://github.com/oxsecurity/megalinter/actions/runs/11010163484/job/30576131056?pr=4037#step:9:41)37
Global options:
2024-09-24T10:11:12Z INFO Need to update DB
2024-09-24T10:11:12Z INFO Downloading DB... repository="ghcr.io/aquasecurity/trivy-db:2"
2024-09-24T10:11:12Z FATAL Fatal error init error: DB error: failed to download vulnerability DB: database download error: oci download error: failed to fetch the layer: GET https://ghcr.io/v2/aquasecurity/trivy-db/blobs/sha256:92209767c55fb2c1a5b24efea1f96899a3956c38332d7685ac1f8580bc451712: TOOMANYREQUESTS: retry-after: 515.848µs, allowed: 44000/minute
Another possibility that I'm currently experimenting with is using https://github.com/yogeshlonkar/trivy-cache-action
It isn't perfect as it only caches the main trivy-db
so if your scan requires downloading an additional DB e.g. trivy-java-db
you might still hit rate limits but it seems to improve things:
- name: Trivy Cache
uses: yogeshlonkar/trivy-cache-action@v0.1.7
with:
gh-token: ${{ secrets.GITHUB_TOKEN }}
prefix: ${{ github.workflow }}
- name: Trivy Vulnerability Scan
uses: aquasecurity/trivy-action@master
with:
scan-type: "image"
image-ref: ${{ inputs.IMAGE_REF }}
output: trivy-docker-report.json
format: json
exit-code: 0
cache-dir: .trivy
As we also run into this issue, could someone tell me, what media type to set for pushing the java-db oci bundle?
I tried the settings @BRONSOLO used for the "non-java-db", but that leads to an unsupported media type error.
Is this an issue with the github container registry or on the trivy side? Is there a different CR we can pull it from without having to setup our own with some sort of caching?
For now our request come through ~ 1 in 5 times
As we also run into this issue, could someone tell me, what media type to set for pushing the java-db oci bundle?
I tried the settings @BRONSOLO used for the "non-java-db", but that leads to an unsupported media type error.
its commented here https://github.com/aquasecurity/trivy-action/issues/389#issuecomment-2368847794
U can use oras cli, but I would think that aquasec should setup an alternativ registry pretty fast.. either that or some cache on them side.
ACTIONS_RUNTIME_TOKEN
@NicholasFiorentini that's interesting, would you mind creating a PR to document this in the repo? If possible, could you also reference where this environment variable is documented?
I only learned about this variable myself from the recent disclosure of the ArtiPACKED vulnerability:
"But I’m left with repos exposing their ACTIONS_RUNTIME_TOKEN
, which is a JWT (JSON Web Token) with an expiration of about six hours according to the exp (expiration) property. ACTIONS_RUNTIME_TOKEN
is an undocumented environment variable, used by several popular actions owned by GitHub, such as actions/cache
and actions/upload-artifact
, to manage caching and artifacts."
ACTIONS_RUNTIME_TOKEN
@NicholasFiorentini that's interesting, would you mind creating a PR to document this in the repo? If possible, could you also reference where this environment variable is documented?
I only learned about this variable myself from the recent disclosure of the ArtiPACKED vulnerability:
"But I’m left with repos exposing their
ACTIONS_RUNTIME_TOKEN
, which is a JWT (JSON Web Token) with an expiration of about six hours according to the exp (expiration) property.ACTIONS_RUNTIME_TOKEN
is an undocumented environment variable, used by several popular actions owned by GitHub, such asactions/cache
andactions/upload-artifact
, to manage caching and artifacts."
Thanks @nnellanspdl for sharing that interesting vulnerability.
Yes, It's a very hidden and obscure variable (I linked the references in the action's codebase here).
I can only say that passing down the variable to the Trivy action has solved the issue for me.
Couple things I checked:
Vulnerability database download code is here.
It looks like there is no way to supply custom credentials for the vulnerability database via CLI parameters.
It also seems like credentials set by the docker/login-action is not picked up the library used under the hood or a GHA authenticated request counts towards the same rate limit.
I wonder if this issue started because of a change GitHub did on their side or Trivy became so popular we just started to run into these issues.
I'm gonna give ACTIONS_RUNTIME_TOKEN
a try, but it doesn't sound like something that should work. If it does, I really don't know why (yet).
I'm gonna give
ACTIONS_RUNTIME_TOKEN
a try, but it doesn't sound like something that should work. If it does, I really don't know why (yet).
I also don't understand how it works. So far, what I'm trying to figure out:
ACTIONS_RUNTIME_TOKEN
variable is automatically propagated from the action's env to the Docker container.ACTIONS_RUNTIME_TOKEN
variable.I tested ACTIONS_RUNTIME_TOKEN and it didn't seem to prevent DB download failures due to too many requests, though its hard to say if it might be making it less likely to fail or not.
- When Trivy calls GitHub, it is likely that the Http Client inside the action is automatically using the value available in the
ACTIONS_RUNTIME_TOKEN
variable.
From what I can tell, the database download happens inside Trivy itself, not the action.
Do anyone know if there is any chance it works well again someday without having to define workarounds ? :/
Can confirm that this fix appears to be working for us:
env:
ACTIONS_RUNTIME_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I tried the settings @BRONSOLO used for the "non-java-db", but that leads to an unsupported media type error.
@PascalTurbo I believe you'll want to use https://github.com/aquasecurity/trivy/blob/37d549e5b86a1c5dce6710fbfd2310aec9abe949/docs/docs/configuration/db.md?plain=1#L108
Something like:
oras push YOUR_REGISTRY \
javadb.tar.gz:application/vnd.aquasec.trivy.javadb.layer.v1.tar+gzip \
--artifact-type application/vnd.aquasec.trivy.config.v1+json
Can confirm that this fix appears to be working for us:
env: ACTIONS_RUNTIME_TOKEN: ${{ secrets.GITHUB_TOKEN }}
I can't. Added that yesterday, still seeing the issue
The trivy-db has been mirrored on a public ECR thanks to this PR https://github.com/aquasecurity/trivy-db/pull/440
I suppose trivy will be updated to now use this ECR by default in a near future, but in the meantime this workaround is working for me:
- name: Run Trivy vulnerability scanner in repo mode
uses: aquasecurity/trivy-action@0.24.0
env:
TRIVY_DB_REPOSITORY: public.ecr.aws/aquasecurity/trivy-db:2
with:
scan-type: 'fs'
ignore-unfixed: true
hide-progress: true
format: 'table'
severity: 'CRITICAL,HIGH'
@damienleger Is that workaround scalable though?
AWS ECR Public Service Quotas lists the following:
Name | Default | Description |
---|---|---|
Rate of authenticated image pulls | Each supported Region: 10 per second | The maximum number of authenticated image pulls per second. |
Rate of image pulls to AWS resources | Each supported Region: 10 per second | The maximum number of image pulls per second to resources running on Amazon ECS, Fargate, or Amazon EC2. |
Rate of unauthenticated image pulls | Each supported Region: 1 per second | The maximum number of unauthenticated image pulls per second. |
Which would seem to work out at far less than the 4400/minute that GHCR is complaining about now. So if lots of people point at the ECR repository aren't they just going to hit rate limits there even sooner?
Has anyone considered raising a feature request for trivy itself to implement backoff/retry on rate limits for the db pull?
I moved to using the trivy bin directly in our workflow and self hosting the db in our own ghcr, but it'd be nice to handle this without requiring uptake for every consumer of trivy-action.
Has anyone considered raising a feature request for trivy itself to implement backoff/retry on rate limits for the db pull?
Retries will cause more requests hence we will hit the rate limits more often considering we are all contributing to the same global rate limit
I used regctl to mirror the image to local repository and it has built-in backoff in case hitting the rate limit. Something similar could be implemented here. That's why retry-after is returned on the request.
regctl image copy ...
time="2024-09-26T09:03:13Z" level=warning msg="Sleeping for backoff" Host=ghcr.io Seconds=1.999993179
Can confirm that this fix appears to be working for us:
env: ACTIONS_RUNTIME_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Not enough... I have applied this to 7 repositories, some worked, others still reporting the same error.
In addition to this issue, within our pipelines rate limits are also triggered within the build of the actions Dockerfile I bet people also run in this issue and due to its hardcoded values even a pull-through cache won't solve this.
FROM ghcr.io/aquasecurity/trivy:0.53.0
COPY entrypoint.sh /
RUN apk --no-cache add bash curl npm
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
FWIW, our workaround using private ECR pull through cache repositories to pull Trivy DB artifacts, with upstream pointing to ghcr.io
worked fine, until it didn't, because now the upstream pulls from GitHub started to get rate limited to the point, that pulls from our private ECR cache repository also started failing (I presume this is ECR's way of warning us that they cannot update the image from upstream, to avoid stale cached images).
Fatal error init error: DB error: failed to download vulnerability DB: database download error: oci download error: failed to fetch the layer: GET https://<redacted>.dkr.ecr.<redacted>.amazonaws.com/v2/github/aquasecurity/trivy-db/blobs/sha256:<redacted>: DENIED: Unable to get upstream layer matching pull through cache rule. The credential specified in the pull through cache rule has reached the upstream registry pull rate limit. You may be able to increase the limit by contacting the upstream registry provider
We are now attempting to work around this by pointing our private ECR pull through cache repositories to corresponding ECR public upstreams instead, to which both the Trivy DB and Trivy Java DB seem to be now published (see here).
Changed here now to --db-repository public.ecr.aws/aquasecurity/trivy-db:2
.. working fine..
lets see what happens kkkk
In addition to this issue, within our pipelines rate limits are also triggered within the build of the actions Dockerfile I bet people also run in this issue and due to its hardcoded values even a pull-through cache won't solve this.
FROM ghcr.io/aquasecurity/trivy:0.53.0 COPY entrypoint.sh / RUN apk --no-cache add bash curl npm RUN chmod +x /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"]
This is exactly the error we are facing as well, @tparrot were you able to find a workaround since?
I have the same problem with other actions that are configured as Dockerfiles. Unfortunately, there seems to be no real solution to this. I even reached out to a private GitHub contact, who offered no help.
Please check out the discussion here
These are built asynchronously to the workflow's steps and as such, putting a docker/login-action ahead of the step that requires them has no effect. This means that any docker pull is subject to Docker Hub's public rate limit for self-hosted runners
Above, replace Docker Hub with GHCR
Hi, we're using trivy to scan our containers, lately we've been seeing an increase number of rate-limiting errors when trivy is downloading the vulnerability database.
My guess is this is a global ratelimit as i can't imagine our low number of devs are causing 700+ requests a second.
I have in the meantime discovered that these scans are only used for SBOM generation on our end so we don't need to download the vulnerability database everytime, but i though this issue should be raised as i can't imagine we are the only ones seeing these errors.