CycloneDX / specification

OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction. SBOM, SaaSBOM, HBOM, AI/ML-BOM, CBOM, OBOM, MBOM, VDR, and VEX
https://cyclonedx.org/
Apache License 2.0
361 stars 57 forks source link

Support for `evidence.licenses.confidence`, methods #459

Open prabhu opened 4 months ago

prabhu commented 4 months ago

The accuracy of license IDs and expressions reported by tools might be limited based on the detection methods used. Attributes like confidence and concludedValue could help with explainability and reasoning.

jkowalleck commented 4 months ago

confidence for evidence? i guess not. evidence are observed behavior - there is no confidence rating for that, or is there? :bulb: Confidence for concluded licenses - this might be a thing ...

prabhu commented 4 months ago

We currently have confidence for evidence.identity and for the methods.

Same way, different license detection methods could have different confidence. For example, identifying license by reading just the package.json (low confidence) vs parsing the license headers and code-snippets of all underlying files to identify the licenses list (medium confidence) vs a service that used both humans and tools to triage and identify the list like clearlydefined (high confidence).

jkowalleck commented 4 months ago

Here is what I've learned from a talk a lawyer gave at ORT conference:

Reading a package manifest gives you the declared license. Declared license is the intention of the package owners. Nothing to observe at all. Nothing to have confidence about, it's a fact.

The raw license headers are evidence, because they are actually observed. Nothing to have confidence about, it's facts.

Parsing/interpreting license headers and making sense out of it brings a somehow concluded license. This value could have a "confidence" property. There could be multiple concluded values - all from different mechanisms or people doing the job...

In the end, concluded license is the only thing that matters it is based on observation (evidence) and intention (declared).

For example, lawyer do that: they make a conclusion based on the other data. Let's say the declared license in the project manifest was "MIT", and in some file headers they found the license headers for "Apache-2.0", they would conclude a SPDX license expression "(MIT AND Apache-2.0)" with a high confidence.