NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.95k stars 13.97k forks source link

mass-replace `license = "foo"` with licenses.… #2999

Closed Fuuzetsu closed 9 years ago

Fuuzetsu commented 10 years ago

If we currently grep through the package tree, many of the license fields are using strings. Perhaps it'd be worth-while to run sed over the tree to substitute the obvious fields with licenses.… equivalents.

peti commented 10 years ago

Don't let anyone stop you from doing that!

domenkozar commented 10 years ago

+1

cillianderoiste commented 10 years ago

I often wondered if there was a reason why the nixpkgs manual uses plain strings: http://nixos.org/nixpkgs/manual/#chap-meta

I'll have a shot at changing that.

Fuuzetsu commented 10 years ago

…and just two subsections later it introduces licenses.…

I will have a look it doing the replace in the near future if no one beats me to it, I wanted to gauge interest first.

Fuuzetsu commented 10 years ago

Just found #739 which is relevant. I'll be doing this now.

Fuuzetsu commented 10 years ago

I have now replaced a lot of the strings with a meta expression, see my branch here.

Now the problems are as follows:

In summary what needs to happen is: come up with a scheme for multi-licensing, come up with a scheme to accomodate custom or one-off licenses and go through remaining ‘ambiguous’ strings (such as BSD), look up the license and make it more specific.

My ideas:

I will squash my branch, remove any whitespace changes and make a PR soon.

FTR here are the remaining unique license strings (I now notice a couple instances I missed when manually going through but the rest still applies):

license = "";
license = "AFL-2.1";
license = "AFL-2.1 or GPL-2";
license = "apache";
license = "artistic";
license = "Artistic-2";
license = "ASF";
license = "as-is";
license = "as-is"; # gentoo is calling it this way..
license = "based on the PHP license - as is";
license = "boost";
license = "Boost 1.0";
license = "boost-license";
license = "bsd";
license = "BSD";
license = "BSD-derived (http://www.repoze.org/LICENSE.txt)";
license = "BSD"; # http://anonscm.debian.org/viewvc/muscleapps/trunk/muscleTool/COPYING?view=markup
license = "BSD License";
license = "BSD-like";
license = "BSD-like (http://repoze.org/license.html)";
license = "bsd"; # multi BSD GPL-2
license = "BSD"; # New BSD license
license = "BSD-Original";
license = "BSD"; # Parallax license, like BSD I think
license = "bsd"; # SGI-B-2.0, which seems BSD-like
license = "BSD"; # Simplified BSD License
license = "BSD-style";
license = "BSD-style, see `license.txt'";
license = "BSD"; # they don't specify which BSD variant
license = "BSL1.0"; # Boost Software License,
license = "CC-PD";
license = "CDDL"; # Common Development and Distribution License
license = "CeCILL-A";
license = "CeCILL B FREE SOFTWARE LICENSE or CeCILL FREE SOFTWARE LICENSE";
license = "CeCILL-B_V1";
license = "CeCILL FREE SOFTWARE LICENSE AGREEMENT";
license = "Click"; # MIT with extra clause, https://github.com/kohler/t1utils/blob/master/LICENSE
license = "CPL-1.0 GPL-2 LGPL-2.1"; # one of those
license = "Eclipse Public License 1.0";
license = "EPL";
license = "EPLv1.0";
license = "FastCGI see LICENSE.TERMS";
license = "free";
license = "free"; # !?
license = "free"; # Combination of LGPL/X11/GPL ?
license = "free"; # https://github.com/clvv/fasd/blob/master/LICENSE
license = "free"; # http://www.info-zip.org/license.html
license = "free"; # LaTeX Project Public License
license = "free"; # many parts under different free licenses
license = "free"; # mix of packages under different licenses
license = "free"; # more free licenses combined
license = "free non-commercial"; #Kermit http://www.columbia.edu/kermit/ckfaq.html#license
license = "free-non-copyleft";
license = "free-noncopyleft";
license = "free, non-copyleft";
license = "free-noncopyleft"; # Apache License fork, actually
license = "free-noncopyleft"; # giftware
license = "free-non-copyleft"; # http://www.libpng.org/pub/png/src/libpng-LICENSE.txt
license = "free-non-copyleft"; # some custom as-is in file headers
license = "free-non-copyleft"; #TODO W3C
license = "free"; /* OSL, see http://www.opensource.org */
license = "free";       # public domain
license = "free, see http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=license";
license = "free"; # seems BSD-like
license = "Free software ?";
license = "free"; # The libs are of LGPLv2.1+, some other pieces are GPL.
license = "free"; #TODO BSD on Gentoo, looks like MIT
license = "freeware";
license = "freeware"; # as an aggregate - data files have different licenses
license = "GNU LGPL";
license = "GNU Library General Public License version 2, with the special exception on linking described in file LICENSE";
license = "gpl";
license = "GPL";
license = "gpl_3";
license = "GPL,free";
license = "GPL/LGPL";
license = "GPLv2+ and BUILD license";
license = "GPLv2 + exception";
license = "GPLv2+ + exception";
license = "GPL-v2 / LGPL-v2.1";
license = "GPLv2/ZPL";
license = "GPL (various)"; # Mix of public domain, Artistic+GPL, GPL1+, GPL2+, GPL3+, and GPL2-only... TODO
license = "GPL with exceptions or ZPL";
license = "Hewlett-Packard BSD-like license";
license = "http://www.hpl.hp.com/personal/Hans_Boehm/gc/license.txt";
license = "http://www.isc.org/sw/dhcp/dhcp-copyright.php";
license = "http://www.pythonware.com/products/pil/license.htm";
license = "http://www.teamspeak.com/?page=downloads&type=ts3_linux_client_latest";
license = "iasl"; # FIXME: is this a free software license?
license = "IBM Public License";
license = "ISC";
license = "lgpl";
license = "LGPL";
license = "LGPL-2.1 Apache-2.0";
license = "LGPL+link exception";
license = "LGPL+linking exceptions";
license = "liberal";  # a non-copyleft license, see `Copyright' file
license = "LPPL-1.2";       # LaTeX Project Public License
license = "mBSD";
license = "MIT-like";
license = "MIT / LPL";
license = "MonetDB Public License"; # very similar to Mozilla public license (MPL) Version see 1.1 http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html
license = "Most Ocamlnet modules are released under the zlib/png license. The HTTP server module Nethttpd is, however, under the GPL.";
license = "MPL";
license = "MPL1.1";
license = "New BSD";
license = "ngrep";  # Some custom BSD-style, see LICENSE.txt
license = "non-commercial";
license = "non-free";
license = "nonfree";
license = "non-free"; # Basically "not for commercial profit"
license = "nonfree"; #MicroChip-PK2
license = "null";
license = "OFL";
license = "OpenSceneGraph Public License - free LGPL-based license";
license = "Open Software License v1.1";
license = "open_source";
license = "open source, see included files";
license = "PayPal SDK License";
license = "permissive";
license = "PHP+";
license = "PHP-3";
license = "PSF License";
license = "PSF or ZPL";
license = "public domain";
license = "publicDomain";
license = "Public Domain";
license = "public domain, Python, 2-Clause BSD, GPL 3 (see COPYING.txt)";
license = "Python 2.1.1";
license = "Python+LLNL";
license = "QPL";
license = "QPL, LGPL2 (library part)";
license = "Qwt License, Version 1.0";
license = "revised BSD";
license = "revised-BSD";
license = "Ruby";
license = "samsung";  # Binary-only
license = "SciLab";
license = "SIL";
license = "SOME OPEN SOURCE LICENSE"; # TODO which exactly is this?
license = "SSLeay";
license = "Standard PIL License";
license = "?"; # the .py file is GPLv2
license = "TrueCrypt License Version 2.6";
license = "ttf2pt1";
license = "unfree-redistributable";
license = "unfree-redistributable"; #Amazon
license = "unfree-redistributable"; # Amazon http://aws.amazon.com/asl/
license = "unfree-redistributable"; # Amazon || (Ruby GPL-2)
license = "unfree-redistributable"; #TODO freedist, libs under BSD-3
license = "Unicode Fonts for Ancient Scripts";
#license = "unknown";
license = "unknown";
license = "UNKNOWN";
license = "unrestricted";
license = "unspecified"; # !
license = "verbatim-redistribution";
license = "Vovida 1.0"; # See any header file.
license = "VXL License";
license = "w3c"; # http://www.w3.org/Consortium/Legal/
license = "WTFPL"; # http://sam.zoy.org/wtfpl/
license = "zlib/libpng";
license = "ZLIB/LIBPNG"; # see README.
license = "ZPL";
vcunat commented 10 years ago
Fuuzetsu commented 10 years ago

Problem with using an external link is that it's often not possible to do easily: sometimes licenses are only in source file headers (do we link to some random file?) or inside tarballs (do we link to the tarball? Do we update with each version? We don't really want to download the whole thing to read the license). It's also not great for tools &c. It's not a big problem though, it just seems less convenient for the user to have to chase up the license themselves.

cillianderoiste commented 10 years ago

I wonder if http://www.monkey.org/~scottij/oss-license-extract.html (or something similar) can help us clean up the licenses.

vcunat commented 10 years ago

@Fuuzetsu: what about linking the debian copyright file? Example: http://metadata.ftp-master.debian.org/changelogs/main/z/zlib/zlib_1.2.8.dfsg-1_copyright

(Only in those cases where upstream provides no good license link. Debian seems to take licensing very seriously.)

Fuuzetsu commented 10 years ago

@cillianderoiste Interesting although I don't know how well it works in practice, I've never heard about it before.

@vcunat Ah, it does seem like they have a lot of licensing information. I'm unsure about linking to it (they might move it or they might not have the version we do) but it should definitely be useful if we want to look up what license something has (perhaps automatically). Problem about claslsification remains (free? redistributable?) but that can be done by a human if need be.

vcunat commented 10 years ago

In most cases the classification is trivial (no need to set anything extra, as it's free which implies redistributable). If we link to direct version, the link will disappear when they update it. I'd link to generic version like [zlib]. That might get wrong if they update to a different version than we do and the project relicenses in-between, but that's such an improbable thing to happen...

[zlib] http://metadata.ftp-master.debian.org/changelogs/main/z/zlib/testing_copyright

aszlig commented 10 years ago

Also worth having a look at the devscripts package from Debian:

- licensecheck: attempt to determine the license of source files

A licensecheck -r <dir> should spit out all the licenses of the source files. So it's similar to oss-license-extract mentioned by @cillianderoiste, but haven't tried/reviewed the latter yet.

cillianderoiste commented 10 years ago

Interesting. I wonder how tricky it would be to write our own script to check for licenses and spit out the correct attribute from lib/licenses.nix. IMO checking the headers of each file to audit the license is overkill for the sake of setting the metadata attribute, but perhaps we could just check for standard license files and go with that.

Fuuzetsu commented 10 years ago

Well, it's important to get the right license on whether it seems like overkill or not. Many projects only use headers to indicate copyright so it'd at least have to be a fallback.

cillianderoiste commented 10 years ago

silver_hook mentioned SPDX on IRC, which looks exhaustive: http://spdx.org/licenses/ perhaps we should adopt these identifiers? These are also used in the appstream tag for metadata_license: http://www.freedesktop.org/software/appstream/docs/chap-Metadata.html

bjornfor commented 10 years ago

+1 for adopting those identifiers.

Fuuzetsu commented 10 years ago

Looks reasonable but I notice it separates X11 and MIT whereas we don't so that's something to look out for

bjornfor commented 10 years ago

Given that there is a standard for short identifiers (http://spdx.org/licenses/), I got an idea.

I think the point of using lib.licenses.* attributes (over free form text) were 1) to prevent typos and 2) to attach metadata to licenses. With a standard identifier set, there is no need for attaching metadata in nixpkgs, that metadata can be managed externally. The remaining issue is preventing license typos. This can be done by adding validLicenses = [ <all_spdx.org_short_identifiers> ] to lib/licenses.nix and checking that meta.license appears in validLicenses.

Thoughts?

vcunat commented 10 years ago

Well, I don't see any advantage of dropping the current style in favor of the plain strings (again). Also, allowing attrsets is more flexible w.r.t. non-standard licenses.

Fuuzetsu commented 10 years ago

Plain strings have the disadvantage of everyone writing them down in their own way which is why we now have about 15 ways in which BSD is specified. This makes it very hard to ask for what packages are under BSD.

silverhook commented 10 years ago

I’d very much suggest sticking to SPDX identifiers (if we can’t supply the full RDF), since these are basically the de facto standard.

I don’t know how useful it is to this specific issue, but here’s a list of FS license detection tools:

vcunat commented 10 years ago

A spdx naming PR: #3408.

vcunat commented 9 years ago

I think this has been resolved.

Fuuzetsu commented 9 years ago

Well, there is still more work to do but the ‘mass-replace’ part has been finishsed (for the most part).