Open DennisClark opened 8 months ago
other elements could be:
element: number_of_copyrights_detected description: the number of copyright statements detected in a scan
element: number_of_authors_detected description: the number of authors (contributors) detected in a scan
element: number_of_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL
we might also add:
element: number_of_dependencies_detected description: the number of dependencies identified by inspecting the files that specify other software (usually third-party) required by the project codebase being scanned
Some comments:
I like "SCA Clarity". Let's use that term for this.
I think we have enough elements identified now to move ahead with some kind of SCA Clarity support in SCIO.
Should this be a standard feature that does not require setting a specific option when doing the scan/etc ? I think yes, but if there are other thoughts on that, they are welcome here.
We need to order this so that the clarity of the SBOM contents (software units) is scored separately from the clarity of origin and license information for those software units.
a further refinement is probably needed. My original suggestion of element: number_of_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL
should perhaps be broken down into two types to support container analysis:
element: number_of_system_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL that originate from a distro or distro repo
element: number_of_application_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL that do not originate from a distro or distro repo
probably best to do the counting of the data in a new pipeline compute-sca-clarity
we might also add a negative element:
element: number_of_misleading_matches_reported description: the number of matches (snippet or whole file) that are not quite accurate or do not add meaningful value.
We need to define the scoring elements (criteria), and their weighting factors, to evaluate the quality of scan results, working name "SCA Clarity", roughly equivalent to our scoring elements for license clarity on a specific project. To get things started, I would suggest that some major elements would be
element: number-of-exact-licenses-detected description: the number of licenses detected with an exact license key match.
element: number-of-unknown-licenses-detected description: the number of licenses detected with no exact license key match.
element: percentage-of-exact-licenses-detected description: a percentage of all the license detections that identify specific license keys, as opposed to unknown license references where the text is not matched precisely to a known license.
More ideas and comments are welcome