anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.29k stars 578 forks source link

Extract full license text #2724

Open mmarseu opened 8 months ago

mmarseu commented 8 months ago

What would you like to be added: SBOM formats such as CycloneDX and SPDX support including the full text of a license with a component. It would be great if syft could extract this information when scanning for licenses.

Why is this needed: OSS license compliance is one important use case for SBOMs, especially in large enterprises. SBOMs produced by syft today include components with licenses identified by name (not SPDX ID) which is mostly useless without the accompanying text.

Comment https://github.com/anchore/syft/issues/2002#issuecomment-1674193547 has also asked for such a feature to be implemented, however, I believe it was eventually overlooked when the corresponding issue was closed.

Additional context: Example for curl produced by dpkg cataloger in CycloneDX (modified for conciseness):

{
    "type": "library",
    "name": "curl",
    "version": "7.81.0-1ubuntu1.15",
    "licenses": [
        // snip
        {
            "license": {
                "name": "other"
            }
        },
        {
            "license": {
                "name": "public-domain"
            }
        }
    ],
    "purl": "pkg:deb/ubuntu/curl@7.81.0-1ubuntu1.15?arch=amd64&distro=ubuntu-22.04",
    "properties": [
        // snip
        {
            "name": "syft:location:0:path",
            "value": "usr/share/doc/curl/copyright"
        },
        {
            "name": "syft:location:1:path",
            "value": "var/lib/dpkg/info/curl.md5sums"
        },
        {
            "name": "syft:location:2:path",
            "value": "var/lib/dpkg/status"
        }
    ],
    // snip
},
italvi commented 8 months ago

Maybe a good source for licenses like public-domain could be the ScanCode LicenseDB, as unfortunately SPDX will not add such an ID to their list.

tgerla commented 8 months ago

Hi @mmarseu, thanks for the suggestion. We think it makes sense to include full license text or license snippets where available, as an opt-in configuration. We've got some more design work to do but we'll put this issue in the backlog for implementation at some point. If you're interested in working on this, let us know and we can collaborate. Thanks!

wagoodman commented 8 months ago

dev note: we could start adding full license text, when filename/contents are detected to be licenses, or partial license text within a file. These could be persisted on file object in the SBOM, not the package object.

mmarseu commented 8 months ago

Hi @mmarseu, thanks for the suggestion. We think it makes sense to include full license text or license snippets where available, as an opt-in configuration. We've got some more design work to do but we'll put this issue in the backlog for implementation at some point.

Thank you so much! Looking forward to a solution :)

If you're interested in working on this, let us know and we can collaborate. Thanks!

Sadly, I wouldn't be able to write a hello world in go if my life depended on it 😅

Joerki commented 7 months ago

Please let me add that the presence of copyright information is also a signficant legal obligation to mention when software vendors publish their work in an attribution report. In case this information is not provided in package metadata, this information should be provided and maybe extracted to supply then in the SBOM component data. Does it make sense to consider this aspect in this issue as well?

Shapedsundew9 commented 1 month ago

+1