eBay / sbom-scorecard

Generate a score for your sbom to understand if it will actually be useful.
Apache License 2.0
221 stars 24 forks source link

SPDX questions/bugs #22

Closed rnjudge closed 1 year ago

rnjudge commented 1 year ago

Hello! In playing with this tool a bit for Tern reports I have a couple questions and small bugs:

Questions: 1) Does this tool ingest SPDX Tag value format or only JSON? I generated the same SBOM in both formats and only JSON format seemed to register anything in the scorecard tool.

2) Does the tool use LicenseConcluded or LicenseDeclared in its license calculation score? In my experimentation it appears to be using only Concluded (I would argue the tool should be calculating "declared" vs "concluded")

3) Does this tool count LicenseRefs as licenses in the license score or only SPDX License Identifiers?

4) If there are no files present in the SPDX document, should the "file digest" score still be 0 (vs some sort of NA score)?

Bugs:

  1. Licenses do not seem to be properly accounted for in the license calculation score (maybe this is because it's only looking at concluded instead of declared?). In Tern's photon:3.0 container image SBOM (SPDX json format), there are 38 packages. 25 packages have a LicenseRef value as the license concluded, 11 have an actual SPDX License identifier value and 2 are NOASSERTION yet scorecard says the report has 0% licenses found:

    Guessed: spdx
    38 total packages
    0% have licenses.
    ==
    Package Licenses: 0/20
  2. Package version score does not correspond to the % of package versions found? Am I misunderstanding what the correlation should be? If 97% have versions then shouldn't the package version score be higher? Additionally, the 97% score seems to be incorrect. Out of 38 packages, all have a version provided so I'm not sure where 97% is coming from.

    Guessed: spdx
    38 total packages
    97% have package versions.
    ==
    Package Versions: 0/20
rnjudge commented 1 year ago

photon.json.txt - test JSON file for reference (renamed to .txt to make GitHub happy)

photon.spdxtv.txt - tag value version of same container

jspeed-meyers commented 1 year ago

As a party interested in stamping the bugs out of this tool (but not the maintainer--credit to @justinabrahms), I investigated one small aspect of @rnjudge's questions and bug reports. (I'll investigate more later.)

On this part:

Additionally, the 97% score seems to be incorrect. Out of 38 packages, all have a version provided so I'm not sure where 97% is coming from.

I examined the photon.json.txt SBOM and saw that one "package" does seem to be missing version information. See the JSON snippet below. But this "package" is different than the other ones, since it appears to be a container layer (IIUC) and not a open source software package. I think this accounts for why the tools reports 97 percent of packages have versions. Given this, what ought the tool to do?

"packages": [
    {
      "name": "photon",
      "SPDXID": "SPDXRef-photon-3.0",
      "versionInfo": "3.0",
      "downloadLocation": "NOASSERTION",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "copyrightText": "NOASSERTION"
    },
    {
      "name": "ad1f1c6f4fef6e6208ebc53e701bf9937f4e05dce5f601b20c35d8a0ad7fdeff",
      "SPDXID": "SPDXRef-c8a2baeeb2",
      # NO VERSION INFO HERE
      "packageFileName": "ad1f1c6f4fef6e6208ebc53e701bf9937f4e05dce5f601b20c35d8a0ad7fdeff",
      "downloadLocation": "NONE",
      "filesAnalyzed": false,
      "checksums": [
        {
          "algorithm": "SHA256",
          "checksumValue": "c8a2baeeb2639816d78c44738c72246632d712195c634ce53e80fb5cbc0a50c8"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "copyrightText": "NOASSERTION",
      "comment": "Layer 1:\n\tinfo: Layer created by commands: /bin/sh -c #(nop) ADD file:03f8ed1169e4d338a7b5f3f94b3e25379a063f3718bb062533efa2ce61a21d35 in / \n\tinfo: Found 'VMware Photon OS/Linux' in /etc/os-release.\n\tinfo: Retrieved package metadata using tdnf default method. \n\n"
    },
    {
      ...
justinabrahms commented 1 year ago

Thanks for this issue!

To answer your questions:

  1. Tag values are intended to be supported. It is/was hard to find a real one in the wild, so I'll use the one you've attached here as a test case. Thank you.
  2. I think I'm fine with using both, with a preference for declared.
  3. I don't know if it counts LicenseRefs (probably not), but I agree that it should.
  4. If a section isn't applicable, you shouldn't get a zero for it. You should get max points.

I'll probably split your other things into bug tickets and reference them back here.

rnjudge commented 1 year ago

Thanks for this issue!

To answer your questions:

1. Tag values are intended to be supported. It is/was hard to find a real one in the wild, so I'll use the one you've attached here as a test case. Thank you.

2. I think I'm fine with using both, with a preference for declared.

3. I don't know if it counts LicenseRefs (probably not), but I agree that it should.

4. If a section isn't applicable, you shouldn't get a zero for it. You should get max points.

I'll probably split your other things into bug tickets and reference them back here.

Thank you! As @jspeed-meyers pointed out, I think the the package version percentage calculation is correct, just the final score (0/20) is off :)

Also, my 2 cents for concluded vs declared: the declared field should be used for tools that are detecting licenses in software. the concluded field is intended more for human review/confirmation of what tools have found.

justinabrahms commented 1 year ago

Alright. I believe that all of these, except #25 are addressed. Please open additional bugs if you find any and sorry for the delay in getting to these.