SAP / project-kb

Home page of project "KB"
https://sap.github.io/project-kb/
Apache License 2.0
114 stars 73 forks source link

Release of 104 statements (Exact match) #369

Closed matteogreek closed 1 year ago

matteogreek commented 1 year ago

Background

While operating Eclipse Steady internally at SAP, the SAP Security Research team collected a dataset of approximately 1400 vulnerability statements of which a first dataset was published in 2019 as part of project KB (and described in MSR 2019).

Our goal is to disclose an additional batch of vulnerability statements from the SAP-internal dataset and to make them available to the community. To do so, we used Prospector to search for fix commits for the vulnerabilities corresponding to those internal data and we compared the findings of the tool with the fix commits we had identified through manual search.

Objective

With this PR we release 104 new statements for which the results found by Prospector (according to the criteria detailed below) matched exactly the fix commits that appeared in the statements of our private dataset.

Analysis Process

The process begins by executing Prospector to automatically identify fix commits for every vulnerability listed in our private dataset, using the vulnerability identifier and the URL of the vulnerable project's GitHub repository as input parameters. The internal dataset was used for both input parameters.

Upon completion of Prospector execution, an evaluation was performed examining all results from Prospector's findings, extracting candidate fix commits based on the rules that matched. The ranking system of Prospector evaluates each candidate fix commit based on predefined rules, assigning a relevance value to each. To ensure the highest level of confidence in identifying the commit as an effective patch, the statements released with this PR only contain fix commits that matched at least one high-relevance rule.

It is important to note that Prospector introduces the concept of twin commit. Twin commits can be categorized as an equivalent fix commit from a different, parallel branch. These twin commits refer to changes that are made on one branch and then applied to other branches that support different versions of the project.
To better understand the impact of identifying twin commits when comparing the results gathered from Prospector with the internal dataset, we proceeded with two distinct evaluative measures.

  1. In the initial evaluation, the list of fix commits found by Prospector is composed of all high-confidence commits and their corresponding twins. The extracted commits were later used for the comparison with the internal dataset without distinguishing between twins and candidate fix commits.
  2. Later, we chose to replicate the commit extraction methodology, taking commits that matched at least a strong Prospector rule as before, but this time excluding all twin commits.

After having extracted all high-confidence commits from Prospector's findings and grouped them for each vulnerability, we compared the results with our internal dataset. We decided to release as valid statements those that aligned with at least one of the following three validation criteria.

  1. Exact match: Prospector results matched exactly the fix commits that appeared in the internal dataset.
  2. Exact match - twins excluded: Like the previous case but this time excluding twin commits as part of the set of fix commits.
  3. Strict subset: The commits of the internal dataset were a strict subset of Prospector results.

Results

With this PR we release 104 new statements that matched the Exact match validation criteria. The results found by Prospector matched exactly the fix commits that appeared in the statements of our private dataset.