Open pombredanne opened 2 years ago
Since this is also the format that reuse (https://github.com/fsfe/reuse-tool) uses for annotating copyright for files that can't be annotated directly or for bulk annotating a directory being able to parse this would make a lot of sense.
@Blackclaws we support debian copyright files format extensively in https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/debian_copyright.py
The issue here is about the file names we support. Basically we only attempt to parse file with these path patterns as copyright files: '*/debian/copyright'
and '*usr/share/doc/*/copyright'
as copyright is a common name otherwise.
Since Metasploit use "/LICENSE" as a filename, the idea would be to actually check the first line of the file to recognize the file as a likely copyright file.
With this said, REUSE uses .reuse/dep5
as a path for their copyright files and we should also support this too So thank you ++ for chiming in!
For reference: if I scan https://github.com/fsfe/reuse-tool/blob/master/.reuse/dep5 (FYI @mxmehl @carmenbianca )
$ mkdir -p example/debian
$ wget -O example/debian/copyright https://raw.githubusercontent.com/fsfe/reuse-tool/master/.reuse/dep5
$ scancode --system-package --yaml dep5.yaml.txt example/
I get this:
...
- path: example/debian/copyright
type: file
package_data:
- type: deb
namespace:
name:
version:
qualifiers: {}
subpath:
primary_language:
description:
release_date:
parties: []
keywords: []
homepage_url:
download_url:
size:
sha1:
md5:
sha256:
sha512:
bug_tracking_url:
code_view_url:
vcs_url:
copyright: |
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
2017 Free Software Foundation Europe e.V. <https://fsfe.org>
license_expression: gpl-3.0-plus AND cc-by-sa-4.0 AND cc-by-sa-4.0 AND gpl-3.0-plus
AND gpl-3.0-plus AND cc0-1.0 AND cc0-1.0
declared_license:
- GPL-3.0-or-later
- CC-BY-SA-4.0
- CC-BY-SA-4.0
- GPL-3.0-or-later
- GPL-3.0-or-later
- CC0-1.0
- CC0-1.0
notice_text:
source_packages: []
file_references: []
extra_data: {}
dependencies: []
repository_homepage_url:
repository_download_url:
api_data_url:
datasource_id: debian_copyright_in_source
purl:
...
@pombredanne Small notice that .reuse/dep5
will be deprecated at some point in the future
https://github.com/fsfe/reuse-docs/issues/81
Not sure yet on the timeline, and it's very likely that it will remain as an optional deprecated feature for quite a while.
@carmenbianca thank you for the heads up! Now that's OK for us as the change to support the "dep5" name is minor and there will be trailing files named this way likely for a long while.
See https://github.com/rapid7/metasploit-framework/blob/master/LICENSE
this is not named copyright BUT is a copyright file.
IMHO the mere fact that we have:
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
orFormat: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
as the first line could be enough?