DependencyTrack / dependency-track

Dependency-Track is an intelligent Component Analysis platform that allows organizations to identify and reduce risk in the software supply chain.
https://dependencytrack.org/
Apache License 2.0
2.72k stars 580 forks source link

GitHub Advisory Mirroring recursion does not properly handle secondary rate limits #4291

Open officerNordberg opened 1 month ago

officerNordberg commented 1 month ago

Current Behavior

GitHub Mirroring fails nearly daily and is not resilient. While troubleshooting after weeks of ~30 failures per day due I believe to the recursive nature of the mirroring I finally added a sleep statement between all graphql calls which resulted getting the first updates to the VULNERABILITY table since the 11th, the last day it ran without incident. It's hard to confirm this as the CREATED column is not populated on the table but UPDATED is and inferring Descending Primary Keys as "created dates" there are large gaps in GHSA primary keys that correlate to the days I received notifications of failed mirror attempts. I think the recursion is trying to handle this but it just results in 30 or more error Notifications being sent instead of just 1.

I added some additional details to the error message.

2024-10-21 16:49:15,024 ERROR [GitHubAdvisoryMirrorTask] An error was encountered retrieving advisories with HTTP Status : 403 Forbidden 
{  "documentation_url": "https://docs.github.com/free-pro-team@latest/rest/overview/rate-limits-for-the-rest-api#about-secondary-rate-limits",  
"message": "You have exceeded a secondary rate limit. Please wait a few minutes before you try again. 
If you reach out to GitHub Support for help, please include the request ID DEAD:DEED39:7ADE8B:ED56A1:67168609."}

curling my user I see these headers

< x-github-media-type: github.v4; format=json
< x-ratelimit-limit: 5000
< x-ratelimit-remaining: 4845
< x-ratelimit-reset: 1729532625
< x-ratelimit-used: 155
< x-ratelimit-resource: graphql

Attempted to get feedback on this issue as a discussion first. https://github.com/DependencyTrack/dependency-track/discussions/4239

Steps to Reproduce

  1. Enable GitHub Mirroring and notifications when it fails.

Expected Behavior

Secondary rate limits should be handled with a retry with backoff strategy.

Dependency-Track Version

4.11.x

Dependency-Track Distribution

Container Image

Database Server

PostgreSQL

Database Server Version

11.22

Browser

Google Chrome

Checklist

nscuro commented 1 month ago

Been meaning to switch to https://github.com/jeremylong/Open-Vulnerability-Project/tree/main/open-vulnerability-clients#github-security-advisory-example for this anyway. We already use that lib for the NVD REST API integration and it handles retries, also for rate limiting, behind the scenes. In doing this, we should also switch to incremental mirroring so we're not re-downloading everything everyday, which is what we're currently doing.

nscuro commented 1 month ago

@officerNordberg In #4239 you mention:

Is everyone else using this with enterprise accounts and not personal accounts?

Regarding secondary rate limits, the GitHub docs say:

Make too many concurrent requests. No more than 100 concurrent requests are allowed. This limit is shared across the REST API and GraphQL API.

Could it be that your organization has other API tokens that are used for interacting with GitHub's APIs? I'm wondering if simply using a personal token would at least temporarily resolve the issue for you.

In any case, given this statement in the docs:

These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons.

we will need a retry mechanism no matter if the above resolves your problem for now.

officerNordberg commented 1 month ago

Could it be that your organization has other API tokens that are used for interacting with GitHub's APIs? I'm wondering if simply using a personal token would at least temporarily resolve the issue for you.

Nope, the tokens I've used are personal and not Enterprise. I also created new ones and purged old ones. Nothing else was using the tokens. I posted the rate-limit output, it was not hitting the primary limits, it was hitting the secondary limits. I asked about Enterprise tokens to see why this wasn't a more pervasive issue. The only thing that came to mind was everyone else IS using Enterprise tokens which might be why no one else seems to be seeing this issue. The other thought was maybe most people are experiencing mirroring issues but just not seeing it because they don't have notifications configured like ours.

In any case, given this statement in the docs:

These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons.

we will need a retry mechanism no matter if the above resolves your problem for now.

Yes, the current download the entire thing daily without any retry appears to be in need of replacement.