aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.07k stars 536 forks source link

Treat as Debian copyright some extra files #2885

Open pombredanne opened 2 years ago

pombredanne commented 2 years ago

See https://github.com/rapid7/metasploit-framework/blob/master/LICENSE

Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Source: https://www.metasploit.com/

Files: *
Copyright: 2006-2020, Rapid7, Inc.
License: BSD-3-clause

....

this is not named copyright BUT is a copyright file.

IMHO the mere fact that we have: Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ or Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ as the first line could be enough?

Blackclaws commented 2 years ago

Since this is also the format that reuse (https://github.com/fsfe/reuse-tool) uses for annotating copyright for files that can't be annotated directly or for bulk annotating a directory being able to parse this would make a lot of sense.

pombredanne commented 2 years ago

@Blackclaws we support debian copyright files format extensively in https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/debian_copyright.py

The issue here is about the file names we support. Basically we only attempt to parse file with these path patterns as copyright files: '*/debian/copyright' and '*usr/share/doc/*/copyright' as copyright is a common name otherwise.

Since Metasploit use "/LICENSE" as a filename, the idea would be to actually check the first line of the file to recognize the file as a likely copyright file.

With this said, REUSE uses .reuse/dep5 as a path for their copyright files and we should also support this too So thank you ++ for chiming in! For reference: if I scan https://github.com/fsfe/reuse-tool/blob/master/.reuse/dep5 (FYI @mxmehl @carmenbianca )

$ mkdir -p example/debian
$ wget -O example/debian/copyright https://raw.githubusercontent.com/fsfe/reuse-tool/master/.reuse/dep5
$ scancode --system-package --yaml dep5.yaml.txt example/

I get this:

dep5.yaml.txt

...
    -   path: example/debian/copyright
        type: file
        package_data:
            -   type: deb
                namespace:
                name:
                version:
                qualifiers: {}
                subpath:
                primary_language:
                description:
                release_date:
                parties: []
                keywords: []
                homepage_url:
                download_url:
                size:
                sha1:
                md5:
                sha256:
                sha512:
                bug_tracking_url:
                code_view_url:
                vcs_url:
                copyright: |
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                    2017 Free Software Foundation Europe e.V. <https://fsfe.org>
                license_expression: gpl-3.0-plus AND cc-by-sa-4.0 AND cc-by-sa-4.0 AND gpl-3.0-plus
                    AND gpl-3.0-plus AND cc0-1.0 AND cc0-1.0
                declared_license:
                    - GPL-3.0-or-later
                    - CC-BY-SA-4.0
                    - CC-BY-SA-4.0
                    - GPL-3.0-or-later
                    - GPL-3.0-or-later
                    - CC0-1.0
                    - CC0-1.0
                notice_text:
                source_packages: []
                file_references: []
                extra_data: {}
                dependencies: []
                repository_homepage_url:
                repository_download_url:
                api_data_url:
                datasource_id: debian_copyright_in_source
                purl:
...
carmenbianca commented 2 years ago

@pombredanne Small notice that .reuse/dep5 will be deprecated at some point in the future

https://github.com/fsfe/reuse-docs/issues/81

Not sure yet on the timeline, and it's very likely that it will remain as an optional deprecated feature for quite a while.

pombredanne commented 2 years ago

@carmenbianca thank you for the heads up! Now that's OK for us as the change to support the "dep5" name is minor and there will be trailing files named this way likely for a long while.