aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.15k stars 553 forks source link

Imprecise license detection #2772

Open pombredanne opened 2 years ago

pombredanne commented 2 years ago

@chinyeungli reported to me that in pkg:alpine/cfitsio@3.48-r0?arch=x86_64 whose upstream is at http://lcgpackages.web.cern.ch/lcgpackages/tarFiles/sources/cfitsio-3.48.tar.gz we get these detected: public-domain AND nist-pd BUT us-govt-public-domain AND nist-pd may be a better a detection:

The text in the fitsio.h is:

/*  The FITSIO software was written by William Pence at the High Energy    */
/*  Astrophysic Science Archive Research Center (HEASARC) at the NASA      */
/*  Goddard Space Flight Center.                                           */
/*

Copyright (Unpublished--all rights reserved under the copyright laws of
the United States), U.S. Government as represented by the Administrator
of the National Aeronautics and Space Administration.  No copyright is
claimed in the United States under Title 17, U.S. Code.

Permission to freely use, copy, modify, and distribute this software
and its documentation without fee is hereby granted, provided that this
copyright notice and disclaimer of warranty appears in all copies.

DISCLAIMER:

THE SOFTWARE IS PROVIDED 'AS IS' WITHOUT ANY WARRANTY OF ANY KIND,
EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO,
ANY WARRANTY THAT THE SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, AND FREEDOM FROM INFRINGEMENT, AND ANY WARRANTY THAT THE
DOCUMENTATION WILL CONFORM TO THE SOFTWARE, OR ANY WARRANTY THAT THE
SOFTWARE WILL BE ERROR FREE.  IN NO EVENT SHALL NASA BE LIABLE FOR ANY
DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, SPECIAL OR
CONSEQUENTIAL DAMAGES, ARISING OUT OF, RESULTING FROM, OR IN ANY WAY
CONNECTED WITH THIS SOFTWARE, WHETHER OR NOT BASED UPON WARRANTY,
CONTRACT, TORT , OR OTHERWISE, WHETHER OR NOT INJURY WAS SUSTAINED BY
PERSONS OR PROPERTY OR OTHERWISE, AND WHETHER OR NOT LOSS WAS SUSTAINED
FROM, OR AROSE OUT OF THE RESULTS OF, OR USE OF, THE SOFTWARE OR
SERVICES PROVIDED HEREUNDER."

These results could be improved:

headers:
    -   tool_name: scancode-toolkit
        tool_version: 30.0.0
        options:
            input:
                - npd
            --license: yes
            --license-text: yes
            --license-text-diagnostics: yes
            --yaml: '-'
        notice: |
            Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
            OR CONDITIONS OF ANY KIND, either express or implied. No content created from
            ScanCode should be considered or used as legal advice. Consult an Attorney
            for any legal advice.
            ScanCode is a free software code scanning tool from nexB Inc. and others.
            Visit https://github.com/nexB/scancode-toolkit/ for support and download.
        start_timestamp: '2021-11-30T100708.447605'
        end_timestamp: '2021-11-30T100710.682020'
        output_format_version: 2.0.0
        duration: '2.2344396114349365'
        message:
        errors: []
        extra_data:
            spdx_license_list_version: '3.15'
            files_count: 1
files:
    -   path: npd
        type: file
        licenses:
            -   key: nist-pd
                score: '76.52'
                name: NIST Public Domain Notice
                short_name: NIST Public Domain Notice
                category: Public Domain
                is_exception: no
                is_unknown: no
                owner: NIST
                homepage_url: https://github.com/usnistgov/jsip/blob/master/README#L122
                text_url: https://github.com/tcheneau/Routing/blob/f09f46fcfe636107f22f2c98348188a65a135d98/README.md#conditions-of-use
                reference_url: https://scancode-licensedb.aboutcode.org/nist-pd
                scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/nist-pd.LICENSE
                scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/nist-pd.yml
                spdx_license_key: NIST-PD
                spdx_url: https://spdx.org/licenses/NIST-PD
                start_line: 7
                end_line: 30
                matched_rule:
                    identifier: nist-pd_6.RULE
                    license_expression: nist-pd
                    licenses:
                        - nist-pd
                    referenced_filenames: []
                    is_license_text: yes
                    is_license_notice: no
                    is_license_reference: no
                    is_license_tag: no
                    is_license_intro: no
                    has_unknown: no
                    matcher: 3-seq
                    rule_length: 230
                    matched_length: 176
                    match_coverage: '76.52'
                    rule_relevance: 100
                matched_text: |
                    Government [as] [represented] [by] [the] [Administrator]
                    [of] [the] [National] [Aeronautics] [and] [Space] [Administration].  [No] [copyright] [is]
                    [claimed] [in] [the] [United] [States] [under] [Title] [17], [U].S. [Code].

                    Permission to freely use, copy, modify, and distribute this software
                    and its documentation without fee is hereby granted, provided that this
                    [copyright] notice and disclaimer of warranty appears in all copies.

                    [DISCLAIMER]:

                    THE SOFTWARE IS PROVIDED 'AS IS' WITHOUT ANY WARRANTY OF ANY KIND,
                    EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO,
                    ANY WARRANTY THAT THE SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY
                    IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
                    PURPOSE, AND FREEDOM FROM INFRINGEMENT, AND ANY WARRANTY THAT THE
                    DOCUMENTATION WILL CONFORM TO THE SOFTWARE, OR ANY WARRANTY THAT THE
                    SOFTWARE WILL BE ERROR FREE.  IN NO EVENT SHALL NASA BE LIABLE FOR ANY
                    DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, SPECIAL OR
                    CONSEQUENTIAL DAMAGES, ARISING OUT OF, RESULTING FROM, OR IN ANY WAY
                    CONNECTED WITH THIS SOFTWARE, WHETHER OR NOT BASED UPON WARRANTY,
                    CONTRACT, TORT , OR OTHERWISE, WHETHER OR NOT INJURY WAS SUSTAINED BY
                    PERSONS OR PROPERTY OR OTHERWISE, AND WHETHER OR NOT LOSS WAS SUSTAINED
                    FROM, OR AROSE OUT OF THE RESULTS OF, OR USE OF, THE SOFTWARE OR
                    SERVICES PROVIDED HEREUNDER."
            -   key: public-domain
                score: '100.0'
                name: Public Domain
                short_name: Public Domain
                category: Public Domain
                is_exception: no
                is_unknown: no
                owner: Unspecified
                homepage_url: http://www.linfo.org/publicdomain.html
                text_url:
                reference_url: https://scancode-licensedb.aboutcode.org/public-domain
                scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/public-domain.LICENSE
                scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/public-domain.yml
                spdx_license_key: LicenseRef-scancode-public-domain
                spdx_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/public-domain.LICENSE
                start_line: 8
                end_line: 9
                matched_rule:
                    identifier: public-domain_29.RULE
                    license_expression: public-domain
                    licenses:
                        - public-domain
                    referenced_filenames: []
                    is_license_text: yes
                    is_license_notice: no
                    is_license_reference: no
                    is_license_tag: no
                    is_license_intro: no
                    has_unknown: no
                    matcher: 2-aho
                    rule_length: 4
                    matched_length: 4
                    match_coverage: '100.0'
                    rule_relevance: 100
                matched_text: |
                    No copyright is
                    claimed
        license_expressions:
            - nist-pd
            - public-domain
        percentage_of_license_text: '74.38'
        scan_errors: []
adityasangave commented 2 years ago

I am trying to reproduce the results but I am not really sure about the input file, I did try and extract the package put there is no file called as npd as specified in above scan results please tell me how to reproduce this result after scan

pombredanne commented 2 years ago

The file is named fitsio.h as explained in the issue body above: "The text in the fitsio.h is: ....