CycloneDX / cyclonedx-dotnet

Creates CycloneDX Software Bill of Materials (SBOM) from .NET Projects
https://cyclonedx.org/
Apache License 2.0
183 stars 88 forks source link

License resolution failing to get correct license #489

Closed alitheg closed 4 weeks ago

alitheg commented 2 years ago

Please tell me if this is actually a problem importing into DependencyTrack, but it feels like an issue generating the BOM. I've got several dependencies (for example, Microsoft.Azure.Cosmos and System.Buffers) which don't show a license when imported into DependenctTrack. I've looked at the bom.xml generated, and in these cases the license is set to a URL. In the examples above, it's:

Both of these URLs resolve into a MIT license file if you follow them, so shouldn't the BOM be generated showing that as the license instead of just the URL?

rkg-mm commented 2 years ago

I can add some here, e.g. Google.Api.CommonProtos 2.2.0 Microsoft.ApplicationInsights.Kubernetes 2.0.2 -> both don't resolve but add URL https://aka.ms/deprecateLicenseUrl. in NuGet and Github a license is defined properly it seems

rkg-mm commented 2 years ago

Ok the license detection of the nuget tool seems to be way worse than the npm tools. But I did dig a bit deeper and this seems to be mostly because of information missing in NuGet repo. Compare https://api.nuget.org/v3-flatcontainer/Keycloak.Net/1.0.18/Keycloak.Net.nuspec with https://api.nuget.org/v3-flatcontainer/AutoMapper/11.0.1/AutoMapper.nuspec

AutoMapper contains a license URL AND a license expression, hence is correctly identified in BOM. Keycloak is only returning a license URL, but lacks the expression, and then only the URL is contained in BOM. Also GitHub seems not to be requested in this case as far as I can see, cause there the license is properly defined. Probably as soon as anything regarding license is returned, that will be taken and no further investigation done via github etc.

And my examples with the deprecateLicenseUrl also are not a tool failure, since the NuGet API is returning exactly this information: https://api.nuget.org/v3-flatcontainer/Google.Api.CommonProtos/2.2.0/Google.Api.CommonProtos.nuspec

I added this to 3 projects now and licenses are not identified for 50-80% of the dependencies in these projects, which means its pretty much missing most of the licenses (talking about 120-180 dependencies per project). Any ideas how to improve this?

RodneyRichardson commented 2 years ago

Would it be worth downloading and parsing the license text, such as done by licensee?

RodneyRichardson commented 2 years ago

Another license url that doesn't resolve for pkg:nuget/EPPlus@4.5.3.3: https://aka.ms/deprecateLicenseUrl

This resolves to here: https://docs.microsoft.com/en-us/nuget/consume-packages/finding-and-choosing-packages#license-url-deprecation

which gives details of how to find the actual license file (by downloading and extracting the package).


And a few more (Microsoft) projects/packages that could possibly be resolved by reading the text:

ecaisse commented 2 years ago

I also have the same issue with some Nuget packages, and I believe I have a way to fix this issue in most cases. There's a couple of things that needs to be addressed before actually doing it.

GitHub Licenses API

Would it be worth downloading and parsing the license text, such as done by licensee?

As per GitHub's documentation, the Licenses API uses Licensee behind the scenes. With that said, we can find this comment in this tool's code:

https://github.com/CycloneDX/cyclonedx-dotnet/blob/532bd3ac692581e087f4c2782189b64f354ffcb3/CycloneDX/Services/GithubService.cs#L132-L134

Checking the commit that introduced this comment, it was in 2019. I have not been able to find whether or not this is still a bug in the API, there doesn't seem to be any documentation of the bug or the support ticket anywhere.

Repository URL in Nuspec files

Using Keycloak.Net nuspec file as an example, we can see 3 references to it's GitHub repo: licenseUrl, projectUrl, and repository/@url. The repository node is, in my opinion, the best option to select, as it also gives the commit ref.

Assuming that the Licenses API works as intended, then a fallback could be implemented from the licenseUrl to use the repository URL and the commit ref, something like:

var repositoryUrl = "..."; // read from .nuspec file
var commitRef = "..."; // read from .nuspec file

// could also use Regex, UriBuilder, etc., depending on potential URL formats in .nuspec files
var apiUrl = repositoryUrl.Replace("github.com", "api.github.com/repos") + $"?ref={commitRef}"; 

As for the GithubService class, we could either add an overload GetLicenseForRepositoryAsync(string repositoryUrl, string commitRef), or add parameters to the existing GetLicenseAsync. I personally prefer refactoring the existing method, replacing the licenseUrl parameter with a parameter object containing all the information (licenseUrl, repoUrl, commitRef), since the NugetService shouldn't really need to "know" it has to fallback in some edge case. Additionally, we could also add another fallback using the projectUrl, but this gets tricky since we don't have a commit or branch ref...

CLI switch

This is more of an open question, if the Licenses API still has that bug (or if we cannot confirm if it does or not), should there be a CLI switch to enable (or disable) the fallback to the repoUrl/commitRef anyway? Or maybe we only fallback in the case where licenseUrl is not a valid GitHub URL? This would at least give the option to choose depending on their use case.

RodneyRichardson commented 2 years ago

This seems like it should find the KeyCloak license (even with it using a "main" branch): https://api.github.com/repos/lvermeulen/Keycloak.Net/license?ref=b52d4e6f2697e88d6ff12afe280b415ef804e8cb

It doesn't, however, find the Microsoft licenses (e.g. https://api.github.com/repos/dotnet-core-setup/license returns 404)

ecaisse commented 2 years ago

It doesn't, however, find the Microsoft licenses (e.g. https://api.github.com/repos/dotnet-core-setup/license returns 404)

The URL you're using has a typo, it should be https://api.github.com/repos/dotnet/core-setup/license (note the slash after "dotnet"). Fixing the URL still leads to a 404, however, that's somewhat expected since that repository no longer has a license after being moved to dotnet/runtime.

Still, you have brought up an interesting case. If you look at many repositories under the dotnet organization like Roslyn, ASP.NET Core, or WPF to list a few, the "About" section on the right does show MIT license explicitly, and using the API to get the license returns MIT as expected. However, for some reason, the Runtime repository does not show MIT, and the API just returns "other" as the license type, despite having an identical MIT license.

Your example does show that the GitHub Licenses API is not 100% reliable, at least in the sense that it may fail to return the license (which is manageable unlike if it simply returned a completely different license), something that we have to keep in mind.

AfshinOnline commented 2 years ago

Hi is there any progress here? I really need the license information and getting hundreds of licenses missing in some projects.

@ecaisse

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 3 months with no activity.

github-actions[bot] commented 4 weeks ago

This issue was closed because it has been inactive for 9 months since being marked as stale.