Open nscuro opened 2 years ago
In our case, this issue is the reason not to enable GitHub Advisories. Added vulnerabilities and better descriptions are awesome.
But the noise induced by ~80% of duplicated findings makes it unusable.
Ignoring finding with an existing alias would be good enough as a first solution. Even if identifiers will be undeterministric (first found wins), it greatly reduces the human pain of checking each vulnerability twice.
Hi @nscuro we would like to discuss our approach to tackle this issue.
The most desired approach would be to favor CVE's over any of the alternative identifiers. This assumption should not be hard-coded.
- We want to create a section where the admin can prioritize the vulnerability sources.
- The deduplication will not be applied to previously attributed vulnerabilities to the components.
- Toggle button to hide duplicates (view that takes non audited duplicates and hides them; if all duplicates are non audited, show vulnerability source originated from prioritization of step 1). There will be occasions where a CVE does not exist, yet there are aliases between say GHSA and OSSINDEX. Need to figure out how to handle this case.
- Cascade priority: If for instance CVE does not exist but GHSA and OSSINDEX do, consider the vulnerability source with the highest priority as defined in step 1. There will be occasions where a CVE does not exist initially, but a OSSINDEX finding does. That OSSINDEX finding could be audited. At a later time, a CVE may be created and now there's a mapping. This happened with log4j and is likely the norm for high-profile vulnerabilities. I don't think we want to de-dup any finding that has an existing audit.
- In this instance, first come, first serve. If OSSINDEX was attributed first, the CVE will only be shown as an alias in the future, not attributed.
Why not to use some internal Dependency-Track id (e.g. INT-1234) as a main identifier for vulnerabilities and put identifiers from public vulnerability databases in the alias section from the very begining? Example: New vulnerability identified, CVE-2024-1234. In DT we can see it as INT-1234 and CVE-2024-1234 (or GHSA, or VulnDB) as alias. As soon as there will be GHSA, we can apply its id to alliases, and also attach some additional information.
Why not to use some internal Dependency-Track id (e.g. INT-1234) as a main identifier for vulnerabilities and put identifiers from public vulnerability databases in the alias section from the very begining? Example: New vulnerability identified, CVE-2024-1234. In DT we can see it as INT-1234 and CVE-2024-1234 (or GHSA, or VulnDB) as alias. As soon as there will be GHSA, we can apply its id to alliases, and also attach some additional information.
Hey @fatcatnoregret,
Thank you for your suggestion about using internal IDs as the main identifier for vulnerabilities and adding other identifiers in the alias section. It's a great idea! However, we might still face the challenge of deciding which information to display when users click on a vulnerability. This could be similar to the issue we're trying to address.
Current Behavior:
1642 introduced tracking of vulnerability aliases. We now know which vulnerabilities describe the same issue, but we don't yet use this data to reduce the overall noise of findings. Clients may perform de-duplication based on their specific needs (e.g., always preferring GHSA over CVE), but we should offer a canonical solution from the server-side.
Proposed Behavior:
There should be a mechanism to de-duplicate vulnerabilities. In order to stay backwards-compatible, new API endpoints or opt-in parameters should be introduced.
Note that the intention is not to de-duplicate during vulnerability data ingestion! We still want to keep the data from all sources.
There are multiple constraints that need to be considered. Steve mentioned a few of them in https://github.com/DependencyTrack/dependency-track/pull/1912#issuecomment-1228505074:
(There will be more constraints than this)