Open bennati opened 1 year ago
As a recap:
In details:
When scanning https://raw.githubusercontent.com/xmlunit/xmlunit/04aa11879d86135c37f5af8fd5694bf08d08972d/xmlunit-legacy/pom.xml as a plain text file, we get this:
headers:
- tool_name: scancode-toolkit
tool_version: v31.2.3-372-g18a842e769
options:
input:
- /home/pombreda/tmp/xmlunit-04aa11/xmlunit-legacy/pom.xml
--license: yes
--license-text: yes
--license-text-diagnostics: yes
--yaml: '-'
[..........]
files:
- path: pom.xml
type: file
detected_license_expression: apache-2.0 AND (apache-2.0 AND bsd-new)
detected_license_expression_spdx: Apache-2.0 AND (Apache-2.0 AND BSD-3-Clause)
license_detections:
- license_expression: apache-2.0
detection_log:
- not-combined
matches:
- score: '97.7'
start_line: 3
end_line: 13
matched_length: 85
match_coverage: '100.0'
matcher: 3-seq
license_expression: apache-2.0
rule_identifier: apache-2.0_7.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/apache-2.0_7.RULE
matched_text: |
licensed [to] [You] under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
- license_expression: apache-2.0 AND bsd-new
detection_log:
- not-combined
matches:
- score: '50.0'
start_line: 35
end_line: 41
matched_length: 11
match_coverage: '50.0'
matcher: 3-seq
license_expression: apache-2.0
rule_identifier: apache-2.0_839.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/apache-2.0_839.RULE
matched_text: |
licenses>
<license>
<name>[The] [BSD] [3]-[Clause] License</[name]>
<[url]>[https]://[github].[com]/[xmlunit]/[xmlunit]/[blob]/[main]/[xmlunit]-[legacy]/[LICENSE].txt</url>
<distribution>repo</distribution>
</license>
</licenses>
- score: '100.0'
start_line: 37
end_line: 37
matched_length: 5
match_coverage: '100.0'
matcher: 2-aho
license_expression: bsd-new
rule_identifier: bsd-new_364.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/bsd-new_364.RULE
matched_text: The BSD 3-Clause License</
license_clues: []
percentage_of_license_text: '30.12'
for_license_detections:
- apache_2_0-fc261552-78a1-c631-caec-25ea90dee31f
- apache_2_0_and_bsd_new-5fa1f720-156d-ec9a-965f-f4b1a5afdf86
scan_errors: []
and this is a bug alright as the second match is incorrect The Apache is incorrectly detected in this (see the parts in brackets that are not detected):
matched_text: |
licenses>
<license>
<name>[The] [BSD] [3]-[Clause] License</[name]>
<[url]>[https]://[github].[com]/[xmlunit]/[xmlunit]/[blob]/[main]/[xmlunit]-[legacy]/[LICENSE].txt</url>
<distribution>repo</distribution>
</license>
</licenses>
but even then, the detection ends up correctly reported at the file level:
detected_license_expression: apache-2.0 AND (apache-2.0 AND bsd-new)
detected_license_expression_spdx: Apache-2.0 AND (Apache-2.0 AND BSD-3-Clause)
When scanning as a package with --package I get this:
headers:
- tool_name: scancode-toolkit
tool_version: v31.2.3-372-g18a842e769
options:
input:
- /home/pombreda/tmp/xmlunit-04aa11/xmlunit-legacy/pom.xml
--package: yes
--yaml: '-'
notice: |
Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND, either express or implied. No content created from
ScanCode should be considered or used as legal advice. Consult an Attorney
for any legal advice.
ScanCode is a free software code scanning tool from nexB Inc. and others.
Visit https://github.com/nexB/scancode-toolkit/ for support and download.
start_timestamp: '2023-01-20T101831.060085'
end_timestamp: '2023-01-20T101834.719757'
output_format_version: 3.0.0
duration: '3.659682273864746'
message:
errors: []
warnings: []
extra_data:
system_environment:
operating_system: linux
cpu_architecture: 64
platform: Linux-4.15.0-200-generic-x86_64-with-glibc2.23
platform_version: '#211~16.04.2-Ubuntu SMP Fri Nov 25 09:18:48 UTC 2022'
python_version: "3.9.10 (main, Jan 29 2022, 10:01:49) \n[GCC 5.4.0 20160609]"
spdx_license_list_version: '3.19'
files_count: 1
packages:
- type: maven
namespace: org.xmlunit
name: xmlunit-legacy
version: 2.9.2-SANPSHOT
qualifiers: {}
subpath:
primary_language: Java
description: |
org.xmlunit:xmlunit-legacy
XMLUnit 1.x Compatibility Layer
release_date:
parties: []
keywords: []
homepage_url: https://www.xmlunit.org/
download_url:
size:
sha1:
md5:
sha256:
sha512:
bug_tracking_url:
code_view_url:
vcs_url:
copyright:
declared_license_expression: bsd-new
declared_license_expression_spdx: BSD-3-Clause
license_detections:
- license_expression: bsd-new
detection_log:
- not-combined
matches:
- score: '100.0'
start_line: 1
end_line: 1
matched_length: 5
match_coverage: '100.0'
matcher: 1-hash
license_expression: bsd-new
rule_identifier: bsd-new_364.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/bsd-new_364.RULE
matched_text: The BSD 3-Clause License
other_license_expression:
other_license_expression_spdx:
other_license_detections: []
extracted_license_statement: '[{''name'': ''The BSD 3-Clause License'', ''url'': ''https://github.com/xmlunit/xmlunit/blob/main/xmlunit-legacy/LICENSE.txt'',
''comments'': None, ''distribution'': ''repo''}]'
notice_text:
source_packages:
- pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT?classifier=sources
extra_data: {}
repository_homepage_url: https://repo1.maven.org/maven2/org/xmlunit/xmlunit-legacy/2.9.2-SANPSHOT/
repository_download_url: https://repo1.maven.org/maven2/org/xmlunit/xmlunit-legacy/2.9.2-SANPSHOT/xmlunit-legacy-2.9.2-SANPSHOT.jar
api_data_url: https://repo1.maven.org/maven2/org/xmlunit/xmlunit-legacy/2.9.2-SANPSHOT/xmlunit-legacy-2.9.2-SANPSHOT.pom
package_uid: pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT?uuid=e54b35fe-76ae-4597-8b69-875dd8a9afd9
datafile_paths:
- pom.xml
datasource_ids:
- maven_pom
purl: pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT
dependencies:
- purl: pkg:maven/org.xmlunit/xmlunit-core
extracted_requirement:
scope: compile
is_runtime: no
is_optional: yes
is_resolved: no
resolved_package: {}
extra_data: {}
dependency_uid: pkg:maven/org.xmlunit/xmlunit-core?uuid=8682618d-c265-4d7c-8cd2-1feb5488f859
for_package_uid: pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT?uuid=e54b35fe-76ae-4597-8b69-875dd8a9afd9
datafile_path: pom.xml
datasource_id: maven_pom
- purl: pkg:maven/junit/junit@3.8.1
extracted_requirement: 3.8.1
scope: compile
is_runtime: no
is_optional: yes
is_resolved: yes
resolved_package: {}
extra_data: {}
dependency_uid: pkg:maven/junit/junit@3.8.1?uuid=3cae6762-e0a4-451a-b1d5-365f8f511d60
for_package_uid: pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT?uuid=e54b35fe-76ae-4597-8b69-875dd8a9afd9
datafile_path: pom.xml
datasource_id: maven_pom
- purl: pkg:maven/org.mockito/mockito-core
extracted_requirement:
scope: test
is_runtime: no
is_optional: yes
is_resolved: no
resolved_package: {}
extra_data: {}
dependency_uid: pkg:maven/org.mockito/mockito-core?uuid=10211b7c-b7f6-4621-845e-11111c16e38c
for_package_uid: pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT?uuid=e54b35fe-76ae-4597-8b69-875dd8a9afd9
datafile_path: pom.xml
datasource_id: maven_pom
license_references:
- key: bsd-new
language: en
short_name: BSD-3-Clause
name: BSD-3-Clause
category: Permissive
owner: Regents of the University of California
homepage_url: http://www.opensource.org/licenses/BSD-3-Clause
notes: Per SPDX.org, this license is OSI certified.
is_builtin: yes
is_exception: no
is_unknown: no
is_generic: no
spdx_license_key: BSD-3-Clause
other_spdx_license_keys:
- LicenseRef-scancode-libzip
osi_license_key: BSD-3-Clause
text_urls:
- http://www.opensource.org/licenses/BSD-3-Clause
osi_url: http://www.opensource.org/licenses/BSD-3-Clause
faq_url:
other_urls:
- http://framework.zend.com/license/new-bsd
- https://opensource.org/licenses/BSD-3-Clause
- https://www.eclipse.org/org/documents/edl-v10.php
key_aliases: []
minimum_coverage: '0'
standard_notice:
ignorable_copyrights: []
ignorable_holders: []
ignorable_authors: []
ignorable_urls: []
ignorable_emails: []
text: |
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list
of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
Neither the name of the ORGANIZATION nor the names of its contributors may be
used to endorse or promote products derived from this software without specific
prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
scancode_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bsd-new.LICENSE
licensedb_url: https://scancode-licensedb.aboutcode.org/bsd-new
spdx_url: https://spdx.org/licenses/BSD-3-Clause
license_rule_references:
- license_expression: bsd-new
identifier: bsd-new_364.RULE
language: en
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/bsd-new_364.RULE
is_license_text: no
is_license_notice: no
is_license_reference: yes
is_license_tag: no
is_license_intro: no
is_continuous: no
is_builtin: yes
is_from_license: no
is_synthetic: no
length: 5
relevance: 100
minimum_coverage: 80
referenced_filenames: []
notes:
ignorable_copyrights: []
ignorable_holders: []
ignorable_authors: []
ignorable_urls: []
ignorable_emails: []
text: The BSD 3-Clause License
files:
- path: pom.xml
type: file
package_data:
- type: maven
namespace: org.xmlunit
name: xmlunit-legacy
version: 2.9.2-SANPSHOT
qualifiers: {}
subpath:
primary_language: Java
description: |
org.xmlunit:xmlunit-legacy
XMLUnit 1.x Compatibility Layer
release_date:
parties: []
keywords: []
homepage_url: https://www.xmlunit.org/
download_url:
size:
sha1:
md5:
sha256:
sha512:
bug_tracking_url:
code_view_url:
vcs_url:
copyright:
declared_license_expression: bsd-new
declared_license_expression_spdx: BSD-3-Clause
license_detections:
- license_expression: bsd-new
detection_log:
- not-combined
matches:
- score: '100.0'
start_line: 1
end_line: 1
matched_length: 5
match_coverage: '100.0'
matcher: 1-hash
license_expression: bsd-new
rule_identifier: bsd-new_364.RULE
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/bsd-new_364.RULE
matched_text: The BSD 3-Clause License
other_license_expression:
other_license_expression_spdx:
other_license_detections: []
extracted_license_statement: '[{''name'': ''The BSD 3-Clause License'', ''url'':
''https://github.com/xmlunit/xmlunit/blob/main/xmlunit-legacy/LICENSE.txt'',
''comments'': None, ''distribution'': ''repo''}]'
notice_text:
source_packages:
- pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT?classifier=sources
file_references: []
extra_data: {}
dependencies:
- purl: pkg:maven/org.xmlunit/xmlunit-core
extracted_requirement:
scope: compile
is_runtime: no
is_optional: yes
is_resolved: no
resolved_package: {}
extra_data: {}
- purl: pkg:maven/junit/junit@3.8.1
extracted_requirement: 3.8.1
scope: compile
is_runtime: no
is_optional: yes
is_resolved: yes
resolved_package: {}
extra_data: {}
- purl: pkg:maven/org.mockito/mockito-core
extracted_requirement:
scope: test
is_runtime: no
is_optional: yes
is_resolved: no
resolved_package: {}
extra_data: {}
repository_homepage_url: https://repo1.maven.org/maven2/org/xmlunit/xmlunit-legacy/2.9.2-SANPSHOT/
repository_download_url: https://repo1.maven.org/maven2/org/xmlunit/xmlunit-legacy/2.9.2-SANPSHOT/xmlunit-legacy-2.9.2-SANPSHOT.jar
api_data_url: https://repo1.maven.org/maven2/org/xmlunit/xmlunit-legacy/2.9.2-SANPSHOT/xmlunit-legacy-2.9.2-SANPSHOT.pom
datasource_id: maven_pom
purl: pkg:maven/org.xmlunit/xmlunit-legacy@2.9.2-SANPSHOT
for_packages: []
scan_errors: []
This is better as only as bsd is reported and worse as the apache license of the POM data is missed as we are not looking into comments (yet)
In general the license of package manifest is best collected with --package that knows about the manifest structure. See https://github.com/nexB/scancode-toolkit/issues/707 for the longer story behind this. And also https://github.com/nexB/scancode-toolkit/issues/3024
And also https://github.com/nexB/scancode-toolkit/issues/2294 by @sschuberth and https://github.com/nexB/scancode-toolkit/issues/2552 by @hanna-modica
In contrast, the simple --license does not know that a pom is a pom. I just knows it is text.
Description
Scanning https://github.com/xmlunit/xmlunit leads to false positives. These wrong detections are also influenced by minimal changes in the files, which should not have an impact.
How To Reproduce
./scancode -l --json-pp ./result .../xmlunit
result
and search forxmlunit-legacy/pom.xml
apache-2.0
git checkout v2.6.4
./scancode -l --json-pp ./result .../xmlunit
result
and search forxmlunit-legacy/pom.xml
jsr-107-jcache-spec-2013
xmlunit-legacy/pom.xml
is in the URLgit diff main xmlunit-legacy/LICENSE.txt
System configuration