Open pombredanne opened 3 years ago
From a chat with @chinyeungli
btw, just a note that these massive license_expression may contains irrelevant info such that some of the gpl-2.0 was detected because the copyright file states the debian packaging is under gpl-2.0 while the primary component may not contain any gpl code (For instance, https://changelogs.ubuntu.com/changelogs/pool/universe/s/signon/signon_8.59+17.10.20170606-0ubuntu1/copyright )
From a chat with @JonoYang based on scanning a Ubuntu-based Docker image in https://github.com/nexB/scancode.io/ that contained https://packages.ubuntu.com/bionic-updates/gcc-7
the package gcc-7-base@7.5.0-3ubuntu1~18.04 has the license expression of:
agpl-3.0 AND amd-historical AND artistic-2.0 AND bsd-new AND bsd-no-disclaimer AND bsd-no-disclaimer-unmodified AND bsd-original AND bsd-original-uc AND bsd-original-uc-1986 AND bsd-simplified AND bsla AND d-zlib AND delorie-historical AND flex-2.5 AND gfdl-1.2 AND gpl-1.0-plus AND gpl-2.0 AND gpl-2.0-plus AND gpl-3.0 AND gpl-3.0-plus AND gpl-3.0-plus WITH gcc-exception-3.1 AND hs-regexp AND intel-osl-1989 AND intel-osl-1993 AND lgpl-2.0 AND lgpl-2.0-plus AND lgpl-2.1 AND lgpl-2.1-plus AND lgpl-3.0-plus WITH cygwin-exception-lgpl-3.0-plus AND mit AND newlib-historical AND nilsson-historical AND osf-1990 AND other-copyleft AND other-permissive AND public-domain AND sunpro AND tex-exception AND uoi-ncsa AND viewflow-agpl-3.0-exception AND warranty-disclaimer AND wide-license AND wtfpl-1.0 AND x11-hanson AND x11-lucent AND zlib AND zlib-acknowledgement AND (commercial-license OR proprietary-license)
I'm not sure how the agpl-3.0 detection happened. I looked in scanpipe/scancode.io results for the Resources associated to the package gcc-7-base and I did not find any Resources attached to this package. I downloaded the copyright file for this package from ubuntu (http://changelogs.ubuntu.com/changelogs/pool/main/g/gcc-7/gcc-7_7.5.0-3ubuntu1~18.04/copyright), scanned it, and agpl-3.0 is not detected as a license.
From a chat with @mjherzog based on scanning a Ubuntu-based Docker image in https://github.com/nexB/scancode.io/
We have a problem of license "proliferation" for some packages that we need to fix especially Debian system packages found in a Docker scan. One example is where we have the license expression:
agpl-3.0 AND agpl-3.0-plus AND bloomberg-blpapi AND gpl-1.0-plus AND gpl-2.0 AND gpl-2.0-plus AND lgpl-2.0-plus AND lgpl-2.1 AND lgpl-2.1-plus AND lgpl-3.0 AND mit AND other-permissive AND sun-rpc AND warranty-disclaimer
... for six files from pulseaudio (www.pulseaudio.org in Homepage URL).
I researched the Debian Copyright file from https://metadata.ftp-master.debian.org/changelogs//main/p/pulseaudio/pulseaudio_5.0-13_copyright and found:
- Overall license is lgpl-2.1-plus (also what we have DejaCode) and most Copyright entries say: "License: LGPL-2.1+"
- I also see bloomberg-blpapi, mit, sun-rpc and warranty-disclaimer plus one file under gpl-2.0-plus
My guess is that there may be some sort of license detection bug for the agpl and the other gpl and lgpl versions
See https://github.com/nexB/scancode.io/issues/103#issuecomment-815665295 for a detailed description of the problems
To improve the tracing I think we could have this simple way:
--debian-copyright
and some arg such as_debian_copyright
that would treat *copyright
files as if these were debian copyright files.This way we can get regular license detection results from just copyright files irrespective of being in the cntext of a package or not.
@AyanSinhaMahapatra FYI ^
we should be able to recover from mostly OK but not correct copyright files such as this one: https://metadata.ftp-master.debian.org/changelogs//main/p/pulseaudio/pulseaudio_14.2-1_copyright (this may be a ticket for the debian-inspector debut library though) See https://github.com/nexB/debut/issues/6 Recover parsing from almost machine-readable copyright files
we should have the ability to trace the intermediate detection results (see also #2389 ) for each paragraph of a copyright file
we could establish a mapping of declared License "ids"
there is an implicit notion of primary vs. secondary licenses in a copyright file and we should leverage this: a paragraph with "Files: *" applies to the package as a whole. This may mean a system-wide model change to track primary vs. secondary license or have the ability to track that in a license expression. See https://github.com/nexB/debut/issues/8 Determine the primary license from a copyright file