anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.01k stars 553 forks source link

feat: dpkg license improvement for non SPDX licenses #3090

Open spiffcs opened 1 month ago

spiffcs commented 1 month ago

What happened: Sometimes syft can encounter a dpkg license where the regular expression used to match on contents cannot correctly identify the license.

In the following example we should find things like:

NVIDIA Software License Agreement and CUDA Supplement to Software License Agreement

Reads contents of copyright:

Sends contents for parsing

Searches for license clause

What you expected to happen: Given a copyright file is found SOME license information should be created for a given package. No licenses is a bug.

Steps to reproduce the issue:

syft -o json nvidia/cuda:12.5.1-cudnn-runtime-ubuntu20.04 | grant list -o json | jq -r '.results[]
 | [.license.license_id,] | @csv' | sed 's/"//g'
spiffcs commented 1 month ago

I've tracked down a couple data sources syft could use to identify non SPDX licenses - currently looking at ways to incorporate these to the licenses identification when generating the SBOM