Open rkg-mm opened 3 years ago
First off, thanks for the detailed information on this issue.
This is a tricky one. Older packages don't have decent license information in the metadata. Just a URL. We could try to detect what the license is. Or use the SPDX license expression that is available in a newer package version. But that has problems too.
I don't know a good way to really resolve this. But certainly open to ideas.
I am actually having a similar problem. I have following example entries in the BOM where only license url is available: `
<license>
<url>http://www.apache.org/licenses/LICENSE-2.0</url>
</license>
<license>
<url>http://opensource.org/licenses/LGPL-2.1</url>
</license>
<license>
<url>http://www.apache.org/licenses/LICENSE-2.0.html</url>
</license>
<license>
<url>http://opensource.org/licenses/Apache-2.0</url>
</license>
<license>
<url>http://www.opensource.org/licenses/bsd-license.php</url>
</license>
`
In above cases, the detection of the license could be quite exact for Apache license and Apache-2.0 id could be added beside the url. For example, if substrings "apache.org/licenses/LICENSE-2.0" or "opensource.org/licenses/Apache-2.0" are detected. However, LGPL-2.1 and BSD would be tricky ones and would require reading the URL contents a little bit.
Anyway, it would help a little bit if Apache licenses are detected. What do you think?
Personally, I'd prefer that license information was corrected upstream. But also wouldn't be against some sort of URL license mapping to correct it either.
Perhaps, if someone is interested in implementing it, we could have that mapping done in a way that it could be re-used across implementations. Maybe initially as a file in this project. Then after the initial implementation it could be move to a specific license mapping repo.
@coderpatros license mapping already being used in Core Java. It maps names to SPDX license expressions (including specific license ids). The 'names' could be anything including URLs.
We could move that into a separate repo, rename 'names' to 'strings' or similar so it's more generic.
https://github.com/CycloneDX/cyclonedx-core-java/blob/master/src/main/resources/license-mapping.json
@coderpatros Are there any plans to improve this ? I have so many missing licenses per project that I basically cannot use the dotnet CycloneDX for policies.
This issue is stale because it has been open for 3 months with no activity.
This has become more critical now since Dependency Track validates and rejects the BOM due to this issue.
This has become more critical now since Dependency Track validates and rejects the BOM due to this issue.
This problem should actually be solved in the way, that an only url licence get a name like "Unknown - See URL". Are there packages you still have problems with?
The deeper problem (finding correct spdx ID when there is only a URL) I planned to fix for my self with a tool i wrote/am writing:
https://github.com/mtsfoni/cdx-enrich
This would allow you to manually create a file with a mapping of URLs to SPDX License IDS and then automatically correct those in created SBOMs after generation.
I believe all the packages where this causes problems for us right now are using an older dotnet version (6.0) and require an older dotnet CycloneDX version due to that.
My workaround right now is basically just
jq 'del(.components[].licenses[])' bom.json
since I mostly care about versions and security issues, not licenses.
I believe all the packages where this causes problems for us right now are using an older dotnet version (6.0) and require an older dotnet CycloneDX version due to that.
You can use the current CycloneDX version also to generate SBOMs for projects that are pre 6.0. In fact, you can even use it for framework projects.
It only need dotnet 6+ installed as runtime for CycloneDX
I scanned some of our projects with cyclonedx-dotnet for further use with vulnerability identification tool dependency-track. After importing into dependency-track, many licenses of packages are not visible. In one project with 34 public dependencies, only 4 licenses have been identified.
Looking closer at the missing licenses, i figured out, that the BOM for many packages does either not contain any info (even though NuGet lists some info), or that the license section only contained a url, which according https://cyclonedx.org/docs/1.3/#type_licenseType is not valid. An ID or name must be provided.
Also, I am wondering why licenses have not been identified:
From the missing 30 licenses: 17 there was no info or dead URLs in BOM file, no result expected here. 1 contains no license info in BOM, but NuGet contains a license name but a dead license link. I think the license name could be accessible? 1 contained no license info in BOM, but NuGet contained Name and link to a valid license file, which should be possible to extract? 1 contained a license entry in BOM to a BSD-2-Clause license. License should be recognizable. 10 contained a license entry in BOM to a GitHub MIT License file. License should be recognizable.
I will add some of the BOM entries here: