Closed vargenau closed 7 months ago
@vargenau this could be because the licenses are detected multiple times. Note that you can use the YAML JSON pretty-printed output with extra diagnostic and matched text details to see what issue there may be... Below is the YAML looks like this when scanning the file at https://sourceforge.net/p/phpwiki/code/HEAD/tree/trunk/configurator.php.
We have a first detection with two GPL matches at https://github.com/pombredanne/svn.code.sf.net-p-phpwiki-code/blob/master/configurator.php#L11-L25
Then we have a single GPL match at https://github.com/pombredanne/svn.code.sf.net-p-phpwiki-code/blob/master/configurator.php#L1378
And then some license comments at https://github.com/pombredanne/svn.code.sf.net-p-phpwiki-code/blob/master/configurator.php#L1388L1395 which is not detected correctly (and are in earnest not exactly clear either ... for instance https://github.com/pombredanne/svn.code.sf.net-p-phpwiki-code/blob/master/configurator.php#L1394 "Creative Commons License 2.0" does not mean much of anything)
The comments/suggestion about license could be considered a false positive.
headers:
- tool_name: scancode-toolkit
tool_version: v31.2.3-379-g6358a4b81d
options:
input:
- configurator.php
--license: yes
--license-text: yes
--license-text-diagnostics: yes
--yaml: '-'
notice: |
Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND, either express or implied. No content created from
ScanCode should be considered or used as legal advice. Consult an Attorney
for any legal advice.
ScanCode is a free software code scanning tool from nexB Inc. and others.
Visit https://github.com/nexB/scancode-toolkit/ for support and download.
start_timestamp: '2023-02-25T190554.029923'
end_timestamp: '2023-02-25T190600.136639'
output_format_version: 3.0.0
duration: '6.106726169586182'
message:
errors: []
warnings: []
extra_data:
system_environment:
operating_system: linux
cpu_architecture: 64
platform: Linux-4.15.0-202-generic-x86_64-with-glibc2.23
platform_version: '#213~16.04.1-Ubuntu SMP Wed Jan 11 10:59:04 UTC 2023'
python_version: "3.9.10 (main, Jan 29 2022, 10:01:49) \n[GCC 5.4.0 20160609]"
spdx_license_list_version: '3.19'
files_count: 1
license_detections:
- identifier: gpl_2_0_plus-09165bba-7b1b-0ff0-bdda-dbdcb89da5e8
license_expression: gpl-2.0-plus
count: 1
detection_log:
- not-combined
matches:
- score: '98.17'
start_line: 11
end_line: 23
matched_length: 107
match_coverage: '100.0'
matcher: 2-aho
license_expression: gpl-2.0-plus
rule_identifier: gpl-2.0-plus_1078.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_1078.RULE
- score: '100.0'
start_line: 25
end_line: 25
matched_length: 8
match_coverage: '100.0'
matcher: 1-spdx-id
license_expression: gpl-2.0-plus
rule_identifier: spdx-license-identifier-gpl-2.0-plus-a72d250698ecf7ac942b919f4caaaef61adb1ead
rule_url:
- identifier: gpl_1_0_plus-06400413-49a2-669d-9d2d-6c6d3f5aa266
license_expression: gpl-1.0-plus
count: 1
detection_log:
- not-combined
matches:
- score: '100.0'
start_line: 1378
end_line: 1378
matched_length: 4
match_coverage: '100.0'
matcher: 2-aho
license_expression: gpl-1.0-plus
rule_identifier: gpl_63.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_63.RULE
- identifier: gpl_2_0_plus_and_gfdl_1_1_plus_and_unknown_license_reference_and_cc_by_2_0-62618cb9-6dea-9376-b51c-b7353678d45a
license_expression: gpl-2.0-plus AND gfdl-1.1-plus AND unknown-license-reference AND
cc-by-2.0
count: 1
detection_log:
- possible-false-positive
- not-license-clues-as-more-detections-present
matches:
- score: '50.0'
start_line: 1388
end_line: 1393
matched_length: 10
match_coverage: '50.0'
matcher: 3-seq
license_expression: gpl-2.0-plus
rule_identifier: gpl-2.0-plus_650.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_650.RULE
- score: '100.0'
start_line: 1392
end_line: 1392
matched_length: 4
match_coverage: '100.0'
matcher: 2-aho
license_expression: gfdl-1.1-plus
rule_identifier: gfdl-1.1-plus_10.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gfdl-1.1-plus_10.RULE
- score: '100.0'
start_line: 1393
end_line: 1393
matched_length: 7
match_coverage: '100.0'
matcher: 2-aho
license_expression: gfdl-1.1-plus
rule_identifier: gfdl-1.1-plus_24.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gfdl-1.1-plus_24.RULE
- score: '80.0'
start_line: 1394
end_line: 1394
matched_length: 3
match_coverage: '100.0'
matcher: 2-aho
license_expression: unknown-license-reference
rule_identifier: unknown-license-reference_333.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_333.RULE
- score: '100.0'
start_line: 1395
end_line: 1395
matched_length: 7
match_coverage: '100.0'
matcher: 2-aho
license_expression: cc-by-2.0
rule_identifier: cc-by-2.0_url_glc_55.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/cc-by-2.0_url_glc_55.RULE
- identifier: apache_2_0-d66ab77d-a5cc-7104-e702-dc7df61fe9e8
license_expression: apache-2.0
count: 1
detection_log:
- possible-false-positive
- not-license-clues-as-more-detections-present
matches:
- score: '100.0'
start_line: 1468
end_line: 1468
matched_length: 3
match_coverage: '100.0'
matcher: 2-aho
license_expression: apache-2.0
rule_identifier: spdx_license_id_apache-2.0_for_apache-2.0.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_apache-2.0_for_apache-2.0.RULE
files:
- path: configurator.php
type: file
detected_license_expression: gpl-2.0-plus AND gpl-1.0-plus AND (gpl-2.0-plus AND gfdl-1.1-plus
AND unknown-license-reference AND cc-by-2.0) AND apache-2.0
detected_license_expression_spdx: GPL-2.0-or-later AND GPL-1.0-or-later AND (GPL-2.0-or-later
AND GFDL-1.1-or-later AND LicenseRef-scancode-unknown-license-reference AND CC-BY-2.0)
AND Apache-2.0
license_detections:
- license_expression: gpl-2.0-plus
detection_log:
- not-combined
matches:
- score: '98.17'
start_line: 11
end_line: 23
matched_length: 107
match_coverage: '100.0'
matcher: 2-aho
license_expression: gpl-2.0-plus
rule_identifier: gpl-2.0-plus_1078.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_1078.RULE
matched_text: |
is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* [PhpWiki] is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with [PhpWiki]; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
- score: '100.0'
start_line: 25
end_line: 25
matched_length: 8
match_coverage: '100.0'
matcher: 1-spdx-id
license_expression: gpl-2.0-plus
rule_identifier: spdx-license-identifier-gpl-2.0-plus-3f844e1a237b3ca425edf1127a3c075a0a0c1de6
rule_url:
matched_text: 'SPDX-License-Identifier: GPL-2.0-or-later'
- license_expression: gpl-1.0-plus
detection_log:
- not-combined
matches:
- score: '100.0'
start_line: 1378
end_line: 1378
matched_length: 4
match_coverage: '100.0'
matcher: 2-aho
license_expression: gpl-1.0-plus
rule_identifier: gpl_63.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_63.RULE
matched_text: GNU General Public License", "
- license_expression: gpl-2.0-plus AND gfdl-1.1-plus AND unknown-license-reference
AND cc-by-2.0
detection_log:
- possible-false-positive
- not-license-clues-as-more-detections-present
matches:
- score: '50.0'
start_line: 1388
end_line: 1393
matched_length: 10
match_coverage: '50.0'
matcher: 3-seq
license_expression: gpl-2.0-plus
rule_identifier: gpl-2.0-plus_650.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_650.RULE
matched_text: |
https://www.gnu.org/copyleft/gpl.html#[SEC1]", "
[Other] [useful] [alternatives] [to] [consider]:
<pre>
[COPYRIGHTPAGE]_[TITLE] = \"GNU [Free] [Documentation] [License]\"
[COPYRIGHTPAGE]_[URL] = \"[https]://[www].[gnu].org/copyleft/
- score: '100.0'
start_line: 1392
end_line: 1392
matched_length: 4
match_coverage: '100.0'
matcher: 2-aho
license_expression: gfdl-1.1-plus
rule_identifier: gfdl-1.1-plus_10.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gfdl-1.1-plus_10.RULE
matched_text: GNU Free Documentation License\"
- score: '100.0'
start_line: 1393
end_line: 1393
matched_length: 7
match_coverage: '100.0'
matcher: 2-aho
license_expression: gfdl-1.1-plus
rule_identifier: gfdl-1.1-plus_24.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gfdl-1.1-plus_24.RULE
matched_text: https://www.gnu.org/copyleft/fdl.html\"
- score: '80.0'
start_line: 1394
end_line: 1394
matched_length: 3
match_coverage: '100.0'
matcher: 2-aho
license_expression: unknown-license-reference
rule_identifier: unknown-license-reference_333.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_333.RULE
matched_text: License 2.0\"
- score: '100.0'
start_line: 1395
end_line: 1395
matched_length: 7
match_coverage: '100.0'
matcher: 2-aho
license_expression: cc-by-2.0
rule_identifier: cc-by-2.0_url_glc_55.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/cc-by-2.0_url_glc_55.RULE
matched_text: https://creativecommons.org/licenses/by/2.0/\"</
- license_expression: apache-2.0
detection_log:
- possible-false-positive
- not-license-clues-as-more-detections-present
matches:
- score: '100.0'
start_line: 1468
end_line: 1468
matched_length: 3
match_coverage: '100.0'
matcher: 2-aho
license_expression: apache-2.0
rule_identifier: spdx_license_id_apache-2.0_for_apache-2.0.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_apache-2.0_for_apache-2.0.RULE
matched_text: Apache >= 2.0.
license_clues: []
percentage_of_license_text: '1.45'
for_license_detections:
- gpl_2_0_plus-09165bba-7b1b-0ff0-bdda-dbdcb89da5e8
- gpl_1_0_plus-06400413-49a2-669d-9d2d-6c6d3f5aa266
- gpl_2_0_plus_and_gfdl_1_1_plus_and_unknown_license_reference_and_cc_by_2_0-62618cb9-6dea-9376-b51c-b7353678d45a
- apache_2_0-d66ab77d-a5cc-7104-e702-dc7df61fe9e8
scan_errors: []
Hi Philippe,
I agree "Creative Commons License 2.0" means nothing, I will replace it.
I understand that the license appears multiple times in the SPDX file because it was detected multiple times.
But there is no added value in the SPDX file to have several identical lines.
I would expect some postprocessing to remove the duplicates. This is already done for the top-level package
PackageName: phpwiki
where you have a list of PackageLicenseInfoFromFiles
in alphabetic order without duplicates.
You could do the same for each file.
As a side note, converting the SPDX file from tag:value to e.g. JSON and then back to tag:value with the online converter will remove the duplicates:
LicenseInfoInFile: Apache-2.0
LicenseInfoInFile: CC-BY-2.0
LicenseInfoInFile: GFDL-1.1-or-later
LicenseInfoInFile: GPL-1.0-or-later
LicenseInfoInFile: GPL-2.0-or-later
LicenseInfoInFile: LicenseRef-scancode-unknown-license-reference
But as already said, this is not a real bug, just a possible improvement.
I checked the code, this should be fixed in tools-python.
https://github.com/nexB/scancode-toolkit/issues/3289 will solve this issue.
This is fixed in scancode-toolkit 32.1.0.
Description
Not a real bug, but why is LicenseInfoInFile duplicated for the same file (GFDL-1.1-or-later twice, GPL-2.0-or-later 3 times)?
How To Reproduce
Resulting SPDX file:
phpwiki.spdx.txt
System configuration
Ubuntu 22.10