aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.12k stars 547 forks source link

Unknowns in widnows manifests not relevant? #3603

Open AyanSinhaMahapatra opened 11 months ago

AyanSinhaMahapatra commented 11 months ago

There are a lot of windows-type manifests (like type: winexe) detected in SCTK which have extracted license statements like the following:

{'LegalCopyright': 'Copyright (C) Microsoft Corporation. All rights reserved.', 'LegalTrademarks': '', 'License': None}

And these are detected as: "declared_license_expression": "unknown", Here's an example in our test data: https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/plugin/com-package-expected.json#L41, more are present. Since this is a copyright and then no license is specified, is this really an Unknown license? Should we just ignore if we have a 'License': None here and have None in the license fields instead?

@pombredanne @JonoYang RFC

pombredanne commented 11 months ago

one question is whether a DLL or EXE WinPE file is a package or not. May be we should keep these at the level of data files that are not assembled... But here the lack of a license would be likely still an unknown to me, a string clue that this is under some proprietary license. Unless we can collect evidence this is mostly not used in DLLs and therefore mostly noise and only report it when present and not as unknown when not present.

So in short, here is a suggestion:

  1. Drop assembling these as these are not true package of their own
  2. Still detect licenses at the data file level on these structured metadata
  3. Do not consider the lack of a license to be an "unknown" license to avoid the noise