CycloneDX / cyclonedx-dotnet

Creates CycloneDX Software Bill of Materials (SBOM) from .NET Projects
https://cyclonedx.org/
Apache License 2.0
178 stars 84 forks source link

License name/id missing in BOM, which is invalid according spec #422

Open rkg-mm opened 3 years ago

rkg-mm commented 3 years ago

I scanned some of our projects with cyclonedx-dotnet for further use with vulnerability identification tool dependency-track. After importing into dependency-track, many licenses of packages are not visible. In one project with 34 public dependencies, only 4 licenses have been identified.

Looking closer at the missing licenses, i figured out, that the BOM for many packages does either not contain any info (even though NuGet lists some info), or that the license section only contained a url, which according https://cyclonedx.org/docs/1.3/#type_licenseType is not valid. An ID or name must be provided.

Also, I am wondering why licenses have not been identified:

From the missing 30 licenses: 17 there was no info or dead URLs in BOM file, no result expected here. 1 contains no license info in BOM, but NuGet contains a license name but a dead license link. I think the license name could be accessible? 1 contained no license info in BOM, but NuGet contained Name and link to a valid license file, which should be possible to extract? 1 contained a license entry in BOM to a BSD-2-Clause license. License should be recognizable. 10 contained a license entry in BOM to a GitHub MIT License file. License should be recognizable.

I will add some of the BOM entries here:

    <component type="library">
      <publisher>Microsoft</publisher>
      <name>DocumentFormat.OpenXml</name>
      <version>2.9.1</version>
      <description>[...]</description>
      <hashes>
        <hash alg="SHA-512">FF54E72C4B9937D3E3AA0E27E2F91AC4BD27A7465FBC330AF9E74D5A7923E421E28A879308A08C9CC1BEAF79BA27475844826238A01ED07B5A2164333E955E2B</hash>
      </hashes>
      <licenses>
        <license>
          <url>https://github.com/OfficeDev/Open-XML-SDK/blob/master/LICENSE</url>
        </license>
      </licenses>
      <copyright>© Microsoft Corporation. All rights reserved.</copyright>
      <purl>pkg:nuget/DocumentFormat.OpenXml@2.9.1</purl>
      <externalReferences>
        <reference type="website">
          <url>https://github.com/OfficeDev/Open-XML-SDK</url>
        </reference>
      </externalReferences>
    </component>

    <component type="library">
      <name>FluentAssertions</name>
      <version>2.1.0.0</version>
      <purl>pkg:nuget/FluentAssertions@2.1.0.0</purl>
    </component>

    <component type="library">
      <publisher>Daniel Cazzulino, kzu</publisher>
      <name>Moq</name>
      <version>4.2.1510.2205</version>
      <description>Moq is the most popular and friendly mocking framework for .NET</description>
      <hashes>
        <hash alg="SHA-512">9EA3281D23F97F4E3F93229B4548C6833AAD21BE4B514B341FA9651CE9102794DA931E04CEBD87FB9F69908831926A5B1C070699CFCAB3245952CD5E8673D86A</hash>
      </hashes>
      <licenses>
        <license>
          <url>http://www.opensource.org/licenses/bsd-license.php</url>
        </license>
      </licenses>
      <purl>pkg:nuget/Moq@4.2.1510.2205</purl>
      <externalReferences>
        <reference type="website">
          <url>http://www.moqthis.com</url>
        </reference>
      </externalReferences>
    </component>

        <component type="library">
      <name>Prism</name>
      <version>4.1.0.0</version>
      <purl>pkg:nuget/Prism@4.1.0.0</purl>
    </component>

    <component type="library">
      <publisher>Microsoft</publisher>
      <name>EntityFramework.SqlServerCompact</name>
      <version>6.0.1</version>
      <description>Allows SQL Server Compact 4.0 to be used with Entity Framework.</description>
      <hashes>
        <hash alg="SHA-512">275E942256F2D9FC22C85F923650D233B348B61E1B8D96E7DE38623B4795E584BA6E8304C67D6709A531C01E7FD11F54E82B09AD2B93D2FB8A2200C488D72330</hash>
      </hashes>
      <licenses>
        <license>
          <url>http://go.microsoft.com/fwlink/?LinkID=320539</url>
        </license>
      </licenses>
      <purl>pkg:nuget/EntityFramework.SqlServerCompact@6.0.1</purl>
      <externalReferences>
        <reference type="website">
          <url>http://go.microsoft.com/fwlink/?LinkID=320540</url>
        </reference>
      </externalReferences>
    </component>

    <component type="library">
      <publisher>Microsoft</publisher>
      <name>Rx-Interfaces</name>
      <version>2.2.5</version>
      <description>Reactive Extensions Interfaces Library containing essential interfaces.</description>
      <hashes>
        <hash alg="SHA-512">14AB46D907CC21E51C7C1EEBDDAF3D568022B335F70D3BD81139290BA5341735A9C512BF99C1F2CD83AC9E20DF6A0F8DB031B43F593573CC8D49E7D72A9DEDD8</hash>
      </hashes>
      <licenses>
        <license>
          <url>http://go.microsoft.com/fwlink/?LinkID=261272</url>
        </license>
      </licenses>
      <copyright>Copyright (C) Microsoft Corporation</copyright>
      <purl>pkg:nuget/Rx-Interfaces@2.2.5</purl>
      <externalReferences>
        <reference type="website">
          <url>http://go.microsoft.com/fwlink/?LinkId=261273</url>
        </reference>
      </externalReferences>
    </component>

    <component type="library">
      <publisher>Tatham Oddie</publisher>
      <name>System.IO.Abstractions</name>
      <version>6.0.32</version>
      <description>A set of abstractions to help make file system interactions testable.</description>
      <hashes>
        <hash alg="SHA-512">6803DDF0148569DA4955CE08DCE061568A6DBBDBE0EB26DCCA6FA49699A3027FAEEEF2075C3BA18EE261DE8B5713198A0EBA1EE91C8F4EB756C11C8C531D37C4</hash>
      </hashes>
      <licenses>
        <license>
          <url>https://github.com/System-IO-Abstractions/System.IO.Abstractions/blob/master/LICENSE</url>
        </license>
      </licenses>
      <copyright>Copyright © Tatham Oddie 2010</copyright>
      <purl>pkg:nuget/System.IO.Abstractions@6.0.32</purl>
      <externalReferences>
        <reference type="website">
          <url>https://github.com/System-IO-Abstractions/System.IO.Abstractions</url>
        </reference>
      </externalReferences>
    </component>

    <component type="library">
      <publisher>Microsoft</publisher>
      <name>System.IO.Packaging</name>
      <version>4.5.0</version>
      <description>[...]</description>
      <hashes>
        <hash alg="SHA-512">2279CB66E845770ED1A0309E734583FFA123318AB26F1811726A4C3FC339C454166DE5B6FE984BE604578EA5468916E5D5EC7A4474CFDEBE45FDDE96C8CA94A5</hash>
      </hashes>
      <licenses>
        <license>
          <url>https://github.com/dotnet/corefx/blob/master/LICENSE.TXT</url>
        </license>
      </licenses>
      <copyright>© Microsoft Corporation.  All rights reserved.</copyright>
      <purl>pkg:nuget/System.IO.Packaging@4.5.0</purl>
      <externalReferences>
        <reference type="website">
          <url>https://dot.net/</url>
        </reference>
      </externalReferences>
    </component>

      <component type="library">
      <publisher>Microsoft</publisher>
      <name>System.Reflection.DispatchProxy</name>
      <version>4.5.1</version>
      <description>[...]</description>
      <hashes>
        <hash alg="SHA-512">0E891AFF7719EA93340D55262720114180CDF26B8B7E47DC9DE1F01AFDEFDFBE4BBA6BC2D8B7446B7BB1BC04BCF0B537C359063EF0CD60C372AC1AEF5CE82E61</hash>
      </hashes>
      <licenses>
        <license>
          <url>https://github.com/dotnet/corefx/blob/master/LICENSE.TXT</url>
        </license>
      </licenses>
      <copyright>© Microsoft Corporation.  All rights reserved.</copyright>
      <purl>pkg:nuget/System.Reflection.DispatchProxy@4.5.1</purl>
      <externalReferences>
        <reference type="website">
          <url>https://dot.net/</url>
        </reference>
      </externalReferences>
    </component>
coderpatros commented 3 years ago

First off, thanks for the detailed information on this issue.

This is a tricky one. Older packages don't have decent license information in the metadata. Just a URL. We could try to detect what the license is. Or use the SPDX license expression that is available in a newer package version. But that has problems too.

I don't know a good way to really resolve this. But certainly open to ideas.

masty1982 commented 2 years ago

I am actually having a similar problem. I have following example entries in the BOM where only license url is available: `

    <license>
      <url>http://www.apache.org/licenses/LICENSE-2.0</url>
    </license>

    <license>
      <url>http://opensource.org/licenses/LGPL-2.1</url>
    </license>

    <license>
      <url>http://www.apache.org/licenses/LICENSE-2.0.html</url>
    </license>

    <license>
      <url>http://opensource.org/licenses/Apache-2.0</url>
    </license>

    <license>
      <url>http://www.opensource.org/licenses/bsd-license.php</url>
    </license>

`

In above cases, the detection of the license could be quite exact for Apache license and Apache-2.0 id could be added beside the url. For example, if substrings "apache.org/licenses/LICENSE-2.0" or "opensource.org/licenses/Apache-2.0" are detected. However, LGPL-2.1 and BSD would be tricky ones and would require reading the URL contents a little bit.

Anyway, it would help a little bit if Apache licenses are detected. What do you think?

coderpatros commented 2 years ago

Personally, I'd prefer that license information was corrected upstream. But also wouldn't be against some sort of URL license mapping to correct it either.

Perhaps, if someone is interested in implementing it, we could have that mapping done in a way that it could be re-used across implementations. Maybe initially as a file in this project. Then after the initial implementation it could be move to a specific license mapping repo.

stevespringett commented 2 years ago

@coderpatros license mapping already being used in Core Java. It maps names to SPDX license expressions (including specific license ids). The 'names' could be anything including URLs.

We could move that into a separate repo, rename 'names' to 'strings' or similar so it's more generic.

https://github.com/CycloneDX/cyclonedx-core-java/blob/master/src/main/resources/license-mapping.json

AfshinOnline commented 1 year ago

@coderpatros Are there any plans to improve this ? I have so many missing licenses per project that I basically cannot use the dotnet CycloneDX for policies.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 3 months with no activity.

taladar commented 3 months ago

This has become more critical now since Dependency Track validates and rejects the BOM due to this issue.

mtsfoni commented 3 months ago

This has become more critical now since Dependency Track validates and rejects the BOM due to this issue.

This problem should actually be solved in the way, that an only url licence get a name like "Unknown - See URL". Are there packages you still have problems with?

mtsfoni commented 3 months ago

The deeper problem (finding correct spdx ID when there is only a URL) I planned to fix for my self with a tool i wrote/am writing:

https://github.com/mtsfoni/cdx-enrich

This would allow you to manually create a file with a mapping of URLs to SPDX License IDS and then automatically correct those in created SBOMs after generation.

taladar commented 3 months ago

I believe all the packages where this causes problems for us right now are using an older dotnet version (6.0) and require an older dotnet CycloneDX version due to that.

My workaround right now is basically just

jq 'del(.components[].licenses[])' bom.json

since I mostly care about versions and security issues, not licenses.

mtsfoni commented 3 months ago

I believe all the packages where this causes problems for us right now are using an older dotnet version (6.0) and require an older dotnet CycloneDX version due to that.

You can use the current CycloneDX version also to generate SBOMs for projects that are pre 6.0. In fact, you can even use it for framework projects.

It only need dotnet 6+ installed as runtime for CycloneDX