Open qtomlinson opened 3 months ago
@elrayle @yashkohli88
I implemented a change to resolve the license issue for Nuget.Protocol coordinate. Now the code will only consider the matches if the score is greater than 80%. But it triggered other components to fail in the below mentioned places. 1) pypi/pypi/-/platformdirs/4.2.0 - LicenseRef-scancode-unknown-license-reference is being reported by scancode in PKG-INFO file with 100 score. This adds LicenseRef-scancode-unknown-license-reference in the list of discovered license.
Scancode result -
{
"license_expression": "unknown-license-reference",
"license_expression_spdx": "LicenseRef-scancode-unknown-license-reference",
"from_file": "cd-aYG6pL/platformdirs-4.2.0/PKG-INFO",
"start_line": 11,
"end_line": 11,
"matcher": "2-aho",
"score": 100,
"matched_length": 3,
"match_coverage": 100,
"rule_relevance": 100,
"rule_identifier": "unknown-license-reference_see_license_at_manifest_1.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_see_license_at_manifest_1.RULE",
"matched_text": "License-File: LICENSE",
"matched_text_diagnostics": "License-File: LICENSE"
}
Below is the licensed section from definition for changed code.
"licensed": {
"declared": "MIT",
"toolScore": {
"total": 45,
"declared": 30,
"discovered": 0,
"consistency": 0,
"spdx": 15,
"texts": 0
},
"facets": {
"core": {
"attribution": {
"unknown": 22
},
"discovered": {
"unknown": 19,
"expressions": [
"LicenseRef-scancode-unknown-license-reference AND MIT",
"MIT"
]
},
"files": 22
}
},
"score": {
"total": 45,
"declared": 30,
"discovered": 0,
"consistency": 0,
"spdx": 15,
"texts": 0
}
}
2) 'conda/conda-forge/linux-aarch64/numpy/1.16.6-py36hdc1b780_0' - The 'NOASSERTION' keyword has been replaced by 'LicenseRef-scancode-unknown-license-reference' on many instances. Some places this unknown license expression has been added. Below is the comparison from integration test
expected: {"path":"info/about.json","license":"BSD-3-Clause","hashes":{"sha1":"75bee71c98128117d0a567f2ad35cd01f75750e0","sha256":"5f961516903bac3ca1dd9111c72a858f852b6112da3fda7829bf5d825cd25b37"}}
actual: {"path":"info/about.json","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","hashes":{"sha1":"75bee71c98128117d0a567f2ad35cd01f75750e0","sha256":"5f961516903bac3ca1dd9111c72a858f852b6112da3fda7829bf5d825cd25b37"}}
-------------------
expected: {"path":"info/recipe/meta.yaml","license":"BSD-3-Clause","hashes":{"sha1":"f1022538c9bd0fb683318f39954ae2a085d73a10","sha256":"3d2a25d96d805e0c5b0cab0615118d8bcb860ef92611b188534da32a301be623"}}
actual: {"path":"info/recipe/meta.yaml","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","hashes":{"sha1":"f1022538c9bd0fb683318f39954ae2a085d73a10","sha256":"3d2a25d96d805e0c5b0cab0615118d8bcb860ef92611b188534da32a301be623"}}
-------------------
expected: {"path":"info/recipe/meta.yaml.template","license":"BSD-3-Clause","hashes":{"sha1":"4312867c86b5c46e98b65ed788975a530fd3236a","sha256":"8902b1e3e0205039794cd2702848b717055a0f5dbed0697249c8a4ddffc0543f"}}
actual: {"path":"info/recipe/meta.yaml.template","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","hashes":{"sha1":"4312867c86b5c46e98b65ed788975a530fd3236a","sha256":"8902b1e3e0205039794cd2702848b717055a0f5dbed0697249c8a4ddffc0543f"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/distutils/fcompiler/absoft.py","attributions":["Copyright Absoft Corporation","Copyright Absoft Corporation 1994-2002 Absoft Pro FORTRAN","Copyright Absoft Corporation 1994-1998 mV2 Cray Research, Inc. 1994-1996 CF90"],"hashes":{"sha1":"af8d91b136b5a80ae20f9a7245809be4cc852420","sha256":"00a6e3e6e1abf1da460cbcd12096dd5275d702d17fe64e09aa7ab04d6bf2fad4"}}
actual: {"path":"lib/python3.6/site-packages/numpy/distutils/fcompiler/absoft.py","attributions":["Copyright Absoft Corporation","Copyright Absoft Corporation 1994-2002 Absoft Pro FORTRAN","Copyright Absoft Corporation 1994-1998 mV2 Cray Research, Inc."],"hashes":{"sha1":"af8d91b136b5a80ae20f9a7245809be4cc852420","sha256":"00a6e3e6e1abf1da460cbcd12096dd5275d702d17fe64e09aa7ab04d6bf2fad4"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/f2py2e.py","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 1999 2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"a6c6f2bbc8cd3bed85610cf122cd6264c949dae3","sha256":"c3dcd2246ded9c23323ab81926a8598845280279c8ee853ad64619cefb0b75fa"}}
actual: {"path":"lib/python3.6/site-packages/numpy/f2py/f2py2e.py","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","attributions":["Copyright 1999-2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"a6c6f2bbc8cd3bed85610cf122cd6264c949dae3","sha256":"c3dcd2246ded9c23323ab81926a8598845280279c8ee853ad64619cefb0b75fa"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/setup.py","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"0f3d561e9548e842b8694b5fa479ebe718245ce1","sha256":"a8d088a913dca445212418e286d11711ee088a5e170d8551008fec666ef16613"}}
actual: {"path":"lib/python3.6/site-packages/numpy/f2py/setup.py","license":"BSD-3-Clause AND LicenseRef-scancode-free-unknown","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"0f3d561e9548e842b8694b5fa479ebe718245ce1","sha256":"a8d088a913dca445212418e286d11711ee088a5e170d8551008fec666ef16613"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/f2py2e.cpython-36.pyc","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 1999 2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"13f8ab8f760195b5599f66f4be8c8381f68ecad8","sha256":"50297551bfc28e1e9d91879accc23544a05b2446f2f121ee32dc30acc87a8fa0"}}
actual: {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/f2py2e.cpython-36.pyc","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","attributions":["Copyright 1999-2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"13f8ab8f760195b5599f66f4be8c8381f68ecad8","sha256":"50297551bfc28e1e9d91879accc23544a05b2446f2f121ee32dc30acc87a8fa0"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/setup.cpython-36.pyc","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"f5b2d8b039f675eb7b28c52b936a39c092832f61","sha256":"0c23abb7e046eb20beab087ae9d791a957fc553c191811c94c6ada2d08121a21"}}
actual: {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/setup.cpython-36.pyc","license":"BSD-3-Clause AND LicenseRef-scancode-free-unknown","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"f5b2d8b039f675eb7b28c52b936a39c092832f61","sha256":"0c23abb7e046eb20beab087ae9d791a957fc553c191811c94c6ada2d08121a21"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy-1.16.6.dist-info/METADATA","license":"BSD-3-Clause AND NOASSERTION","hashes":{"sha1":"854d9701eb6441931a7916c8780a5e74bedd5831","sha256":"f8f6b36613e999ecc1fe61cea6ba132d66708aeb7c132c69ce587a0fd25f1b9b"}}
actual: {"path":"lib/python3.6/site-packages/numpy-1.16.6.dist-info/METADATA","license":"BSD-3-Clause AND LicenseRef-scancode-free-unknown","hashes":{"sha1":"854d9701eb6441931a7916c8780a5e74bedd5831","sha256":"f8f6b36613e999ecc1fe61cea6ba132d66708aeb7c132c69ce587a0fd25f1b9b"}}
3) pypi/pypi/-/sdbus/0.12.0 - This coordinate is in discussion to raise a ticket with scancode about its license findings. 4) pod/cocoapods/-/SoftButton/0.1.0 – Readme.MD file license is detected in new code which was not getting in earlier version 5) crate/cratesio/-/ratatui/0.26.0 – testcase failing due to change in repo namespace. All other things are working as previously 6) npm/npmjs/-/redis/0.1.0 – Declared license is getting populated, notice is generated, scores improved. 7) Nuget.Protocol/6.7.1 – NOASSERTION and ECL has been taken care off. Test case failing due to change in the score. 8) deb/debian/-/mini-httpd/1.30-0.2_arm64 – Passed 9) debsrc/debian/-/mini-httpd/1.30-0.2 – Passed 10) pod/cocoapods/-/xcbeautify/0.9.1 – Passed 11) maven/mavencentral/org.apache.httpcomponents/httpcore/4.4.16 – Passed 12) maven/mavengoogle/android.arch.lifecycle/common/1.0.1 – Passed 13) go/golang/rsc.io/quote/v1.3.0 – Passed 14) composer/packagist/symfony/polyfill-mbstring/v1.28.0 – Passed 15) gem/rubygems/-/sorbet/0.5.11226 – Passed 16) git/github/ratatui-org/ratatui/bcf43688ec4a13825307aef88f3cdcd007b32641 – Passed
Here are the code changes related to this - https://github.com/yashkohli88/service/pull/5
In my opinion regarding 'LicenseRef-scancode-unknown-license-reference' cases, this license match is triggered specifically by 'License' keyword present in those files.
Most of the differences have occured due to presence of 'License' keyword in any of the file. New scancode triggers 'LicenseRef-scancode-unknown-license-reference' whenever a license keyword is found in the file. In both the above failed scenarios I have observed this behavior. Attached screenshot where 'matched_text' field from scancode results can be observed to contain the text where this match is found.
'pypi/pypi/-/platformdirs/4.2.0' - There is a 'LicenseRef-scancode-unknown-license-reference' reported in discovered license.
'conda/conda-forge/linux-aarch64/numpy/1.16.6-py36hdc1b780_0' - Difference 1 - "path":"info/about.json" - Expected - "license":"BSD-3-Clause" Actual - "license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference" LicenseRef-scancode-unknown-license-reference is detected because of the keyword 'License.txt'. This can be verified from the screenshot below.
Difference 2 - "path":"info/recipe/meta.yaml" - Expected - "license":"BSD-3-Clause" Actual - "license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference"
@yashkohli88 Thanks for the detailed explanation! I have summarized the findings of adding filtering below: Pros:
Cons:
As per our discussion, need to update the fixture and track the ones with regression in a documentation in operation repo.
This comes from the discussion on PR to integrate new ScanCode, specifically on the license differences in integration tests before and after integrating v32 ScanCode.