aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.12k stars 547 forks source link

dbus-1.12.12 gpl-2.0-plus misidentified as gpl-3.0-plus #2663

Open johnmhoran opened 3 years ago

johnmhoran commented 3 years ago

Reviewing a non-public set of dbus-1.12.12 files, I see that SCTK has identified a number of files containing a gpl-2.0-plus notice (as well as afl-2.1) as gpl-3.0-plus score=96.61. (afl-2.1 is correctly identified.) A publicly-available example is

dbus-1.12.12.tar.gz/dbus-1.12.12.tar/dbus-1.12.12/bus/audit.c

which can be retrieved here: https://dbus.freedesktop.org/releases/dbus/dbus-1.12.12.tar.gz

pombredanne commented 3 years ago

Thank you ++ these are great catches! With https://raw.githubusercontent.com/freedesktop/dbus/dbus-1.12.12/tools/dbus-print-message.c I get:

headers:
    -   tool_name: scancode-toolkit
        tool_version: 21.8.4
        options:
            input:
                - db
            --json-pp: '-'
            --license: yes
            --license-text: yes
            --license-text-diagnostics: yes
            --yaml: '-'
        notice: |
            Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
            OR CONDITIONS OF ANY KIND, either express or implied. No content created from
            ScanCode should be considered or used as legal advice. Consult an Attorney
            for any legal advice.
            ScanCode is a free software code scanning tool from nexB Inc. and others.
            Visit https://github.com/nexB/scancode-toolkit/ for support and download.
        start_timestamp: '2021-08-24T210435.477681'
        end_timestamp: '2021-08-24T210441.224373'
        duration: '5.746704816818237'
        message:
        errors: []
        extra_data:
            files_count: 1
files:
    -   path: db
        type: file
        licenses:
            -   key: gpl-3.0-plus
                score: '95.76'
                name: GNU General Public License 3.0 or later
                short_name: GPL 3.0 or later
                category: Copyleft
                is_exception: no
                is_unknown: no
                owner: Free Software Foundation (FSF)
                homepage_url: http://www.gnu.org/licenses/gpl-3.0-standalone.html
                text_url: http://www.gnu.org/licenses/gpl-3.0-standalone.html
                reference_url: https://scancode-licensedb.aboutcode.org/gpl-3.0-plus
                scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-3.0-plus.LICENSE
                scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-3.0-plus.yml
                spdx_license_key: GPL-3.0-or-later
                spdx_url: https://spdx.org/licenses/GPL-3.0-or-later
                start_line: 1
                end_line: '19'
                matched_rule:
                    identifier: gpl-3.0-plus_24.RULE
                    license_expression: gpl-3.0-plus
                    licenses:
                        - gpl-3.0-plus
                    is_license_text: no
                    is_license_notice: yes
                    is_license_reference: no
                    is_license_tag: no
                    is_license_intro: no
                    has_unknown: no
                    matcher: 3-seq
                    rule_length: 118
                    matched_length: 113
                    match_coverage: '95.76'
                    rule_relevance: 100
                matched_text: |
                    gnu"; [indent]-[tabs]-[mode]: [nil]; -*- */
                    /* [dbus]-[print]-[message].[h]  [Utility] [function] [to] [print] [out] a [message]
                     *
                     * [Copyright] ([C]) [2003] [Philip] [Blundell] <[philb]@[gnu].[org]>
                     * [Copyright] ([C]) [2003] [Red] [Hat], [Inc].
                     *
                     * This program is free software; you can redistribute it and/or modify
                     * it under the terms of the GNU General Public License as published by
                     * the Free Software Foundation; either version [2] of the License, or
                     * (at your option) any later version.
                     *
                     * This program is distributed in the hope that it will be useful,
                     * but WITHOUT ANY WARRANTY; without even the implied warranty of
                     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
                     * GNU General Public License for more details.
                     *
                     * You should have received a copy of the GNU General Public License
                     * along with this program; if not, write to the Free Software
                     * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
        license_expressions:
            - gpl-3.0-plus
        percentage_of_license_text: '76.87'
        scan_errors: []

and with https://raw.githubusercontent.com/freedesktop/dbus/dbus-1.12.12/bus/audit.c I get:

headers:
    -   tool_name: scancode-toolkit
        tool_version: 21.8.4
        options:
            input:
                - audit.c
            --license: yes
            --license-text: yes
            --license-text-diagnostics: yes
            --yaml: '-'
        notice: |
            Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
            OR CONDITIONS OF ANY KIND, either express or implied. No content created from
            ScanCode should be considered or used as legal advice. Consult an Attorney
            for any legal advice.
            ScanCode is a free software code scanning tool from nexB Inc. and others.
            Visit https://github.com/nexB/scancode-toolkit/ for support and download.
        start_timestamp: '2021-08-24T210647.105303'
        end_timestamp: '2021-08-24T210653.431360'
        duration: '6.326068639755249'
        message:
        errors: []
        extra_data:
            files_count: 1
files:
    -   path: audit.c
        type: file
        licenses:
            -   key: gpl-3.0-plus
                score: '96.61'
                name: GNU General Public License 3.0 or later
                short_name: GPL 3.0 or later
                category: Copyleft
                is_exception: no
                is_unknown: no
                owner: Free Software Foundation (FSF)
                homepage_url: http://www.gnu.org/licenses/gpl-3.0-standalone.html
                text_url: http://www.gnu.org/licenses/gpl-3.0-standalone.html
                reference_url: https://scancode-licensedb.aboutcode.org/gpl-3.0-plus
                scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-3.0-plus.LICENSE
                scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-3.0-plus.yml
                spdx_license_key: GPL-3.0-or-later
                spdx_url: https://spdx.org/licenses/GPL-3.0-or-later
                start_line: 1
                end_line: 23
                matched_rule:
                    identifier: gpl-3.0-plus_24.RULE
                    license_expression: gpl-3.0-plus
                    licenses:
                        - gpl-3.0-plus
                    is_license_text: no
                    is_license_notice: yes
                    is_license_reference: no
                    is_license_tag: no
                    is_license_intro: no
                    has_unknown: no
                    matcher: 3-seq
                    rule_length: 118
                    matched_length: 114
                    match_coverage: '96.61'
                    rule_relevance: 100
                matched_text: |
                    gnu"; [indent]-[tabs]-[mode]: [nil]; -*-
                     * [audit].[c] - [libaudit] [integration] [for] [SELinux] [and] [AppArmor]
                     *
                     * [Based] [on] [apparmor].[c], [selinux].[c]
                     *
                     * [Copyright] © [2014]-[2015] [Canonical], [Ltd].
                     * [Copyright] © [2015] [Collabora] [Ltd].
                     *
                     * [Licensed] [under] [the] [Academic] [Free] License [version] [2].[1]
                     *
                     * This program is free software; you can redistribute it and/or modify
                     * it under the terms of the GNU General Public License as published by
                     * the Free Software Foundation; either version [2] of the License, or
                     * (at your option) any later version.
                     *
                     * This program is distributed in the hope that it will be useful,
                     * but WITHOUT ANY WARRANTY; without even the implied warranty of
                     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
                     * GNU General Public License for more details.
                     *
                     * You should have received a copy of the GNU General Public License
                     * along with this program; if not, write to the Free Software
                     * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
            -   key: afl-2.1
                score: '100.0'
                name: Academic Free License 2.1
                short_name: AFL 2.1
                category: Permissive
                is_exception: no
                is_unknown: no
                owner: Lawrence Rosen
                homepage_url: http://www.rosenlaw.com/afl21.htm
                text_url: http://opensource.linux-mirror.org/licenses/afl-2.1.txt
                reference_url: https://scancode-licensedb.aboutcode.org/afl-2.1
                scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/afl-2.1.LICENSE
                scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/afl-2.1.yml
                spdx_license_key: AFL-2.1
                spdx_url: https://spdx.org/licenses/AFL-2.1
                start_line: 9
                end_line: 9
                matched_rule:
                    identifier: afl-2.1_1.RULE
                    license_expression: afl-2.1
                    licenses:
                        - afl-2.1
                    is_license_text: no
                    is_license_notice: no
                    is_license_reference: yes
                    is_license_tag: no
                    is_license_intro: no
                    has_unknown: no
                    matcher: 2-aho
                    rule_length: 9
                    matched_length: 9
                    match_coverage: '100.0'
                    rule_relevance: 100
                matched_text: Licensed under the Academic Free License version 2.1
        license_expressions:
            - gpl-3.0-plus
            - afl-2.1
        percentage_of_license_text: '18.29'
        scan_errors: []
pombredanne commented 3 years ago

@johnmhoran With these two commits, the detection should not work fine in this PR/branch https://github.com/nexB/scancode-toolkit/pull/2667