dotnet / source-build

A repository to track efforts to produce a source tarball of the .NET Core SDK and all its components
MIT License
265 stars 130 forks source link

Expand the scanning tools used for license detection for better coverage #4595

Open mthalman opened 1 week ago

mthalman commented 1 week ago

In https://github.com/dotnet/source-build/issues/4590, a file that was originally thought to be acceptable for inclusion in the VMR for source build was discovered to be associated with a non-free license. A description for how this was found is here: https://github.com/dotnet/source-build/issues/4590#issuecomment-2329672636.

Today, we only use scancode for detecting license references. It did not catch this case because the content of the binary file had no license reference. But the lintian can match on checksums. We should consider expanding the set of tools used for license detection to get better coverage and catch cases like https://github.com/dotnet/source-build/issues/4590. The use of lintian may be a possibility but that requires the targeting of a DEB package, not arbitrary directories. We don't have DEB packages currently available at the time scanning takes place.

dotnet-issue-labeler[bot] commented 1 week ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

dotnet-issue-labeler[bot] commented 1 week ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

mthalman commented 1 week ago

The use of lintian may be a possibility but that requires the targeting of a DEB package, not arbitrary directories. We don't have DEB packages currently available at the time scanning takes place.

This could possibly be helped with the use of https://github.com/dotnet/arcade/pull/15051

dviererbe commented 1 week ago

You could just use the hash lists lintian uses to detect these files if you do not want to integrate the full lintian tool. They can be found here: https://salsa.debian.org/lintian/lintian/-/tree/master/data/cruft

E.g. here is the specific entry for the sRGB.icm file: https://salsa.debian.org/lintian/lintian/-/blob/master/data/cruft/non-free-files#L39