aboutcode-org / vulnerablecode

A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. Sponsored by NLnet https://nlnet.nl/project/vulnerabilitydatabase/ for https://www.aboutcode.org/ Chat at https://gitter.im/aboutcode-org/vulnerablecode Docs at https://vulnerablecode.readthedocs.org/
https://public.vulnerablecode.io
Apache License 2.0
519 stars 190 forks source link

Include commits and patches that fix a vulnerability #207

Open elanzini opened 4 years ago

elanzini commented 4 years ago

The commit that fixed the vulnerability should also be included in the information provided. Anything that can lead to a diff is valuable. This includes links to commits, pull requests and issues.

As suggested by @pombredanne we can use the specification described here, which supports referencing locations in Git, Mercurial, Subversion and Bazaar. A new field named vcs_url can be included for each vulnerability.

The following are some example of links found on NVD, usually reported with the Patch tag:

Commits Lead to diff

Pull Requests Lead to Merge Commit --> diff

Issues Lead to PR --> Merge Commit --> diff

Others Extracting diff if present

Sources of commit links

copernico commented 4 years ago

Thanks for linking project "KB", I guess we should talk (again :-) ) soon and present a demo of our respective work.

sbs2001 commented 3 years ago

@elanzini are you sure https://github.com/google/vulncode-db has commit links other than the one provided my NVD ? I've looked at couple of entries at vulncode-db and all seem to have same data as provided by NVD.

FYI I am working on importing project KB. Eventually we want to tag references as you suggested.

copernico commented 3 years ago

Re: importing from project KB: the kaybee tool can be easily configured to export to whatever (textual) format, and I can assist with that. Also note that the idea of project KB is that there exist an arbitrary number of repositories that share vulnerability information, and not a central repository: instead of replicating the logic of selecting sources and aggregating them, you could consider using kaybee itself.

Heads-up: in the coming days we will release a few hundred vulnerability statements (700 or more); we are currently making a quality-assurance check on the vulnerability data we have (1600+ vulnerabilities at this time).

elanzini commented 3 years ago

@sbs2001 That only holds for a handful of cases where they were manually curated. Most importantly, the biggest problem is to find links that lead to diffs, which indicate what was changed to fix the vulnerability.

Most of those links are GitHub commits, issues, prs but you also have to take into account GitLab, Bugzilla, JIRA tickets, SVN and a bunch of others (e.g. Mailing lists). The landscape is quite fragmented on this front. I am currently working on addressing this problem, trying to extract as many diffs and patches from the links that are gathered.

Are you planning to store just the link to the patches or the diff information as well? (e.g. filename, line numbers)

sbs2001 commented 3 years ago

@copernico

Heads-up: in the coming days we will release a few hundred vulnerability statements (700 or more); we are currently making a quality-assurance check on the vulnerability data we have (1600+ vulnerabilities at this time).

That's awesome you guys rock, can't wait.

As for

the kaybee tool can be easily configured to export to whatever (textual) format, and I can assist with that. Also note that the idea of project KB is that there exist an arbitrary number of repositories that share vulnerability information, and not a central repository: instead of replicating the logic of selecting sources and aggregating them, you could consider using kaybee itself.

I never thought of it this way. Thinking about the tool kaybee I see it very valuable for VulnerableCode and a perfect tool to aid in https://github.com/nexB/vulnerablecode/issues/232 since we eventually(very soon) want to share a knowledge base.

I definitely need to learn more about kaybee but https://sap.github.io/project-kb/ doesn't have much there.

sbs2001 commented 3 years ago

@elanzini

Are you planning to store just the link to the patches or the diff information as well?

Atm just the links. IMHO vulncode-db does a great job at showing the diff information when it is feeded gh commit links.

I am currently working on addressing this problem, trying to extract as many diffs and patches from the links that are gathered.

That's interesting, is there a repo I can check ? Where do you get these links from ?

copernico commented 3 years ago

... but https://sap.github.io/project-kb/ doesn't have much there.

True (we have a tool that is not very useful without some data, indeed ;-) ) We were supposed to publish the first batch of statements (that's how in project KB we call the files that contain data about a vulnerability) this week; we could still make it, but not sure because we are conducting an extra round of QA to be sure we publish high quality information (prioritizing repositories that are popular). This is taking a bit longer than planned, but worst case it will be early next week, stay tuned ;-)

elanzini commented 3 years ago

@sbs2001

Atm just the links. IMHO vulncode-db does a great job at showing the diff information when it is feeded gh commit links.

It's not really about showing the diffs but gathering diff information, not only from gh commits, so that they can be used for research and to pinpoint vulnerabilities at a more fine-grained detail. So, ideally, once you gather patches (that look like this) you can show them and use it in other useful ways.

is there a repo I can check ?

This is the repo but the core of the logic regarding the handling of links and extraction of diffs is done here.

Where do you get these links from ?

This is a list of the sources of information I am pulling from. I am also waiting on vulnerablecode to be deployed and include it a source 🚀

pombredanne commented 3 years ago

Here is the design I suggest:

  1. issues and PR should be treated as vulnerability references. They may lead to a commit but the way to get there is not structured or explicit, but they are still references for our purposes and stored in VulnerabilityReference

  2. the commit(s) that fix a vulnerability should be tracked in their own field in PackageRelatedVulnerability . Since there can be more than one commit, let's use for now a a text field with one commit per line. Each commit will be encoded as a VCS URL ordered from the oldest to the newest commit

In plain English this means that one or more commits are fixing a vulnerability resolved in a certain package version.

Name-wise the field could be named patched_by (kudos to @sbs2001 for this great name) and its description be:

Optional VCS URL(s) for the commits that patch this vulnerability. The VCS URL syntax is specified by the SPDX specification 2.1, section "package download location field". There is one URL per line ordered from the oldest to the newest commit or revision. These commits must be included in the code of the the referenced package version.