Feature request: Add options to show packages with only-FSF, only-OSI or only-DFSG compatible licenses

mahlzahn commented 11 months ago

Idea (see, also #5) by @im397:

-f, --list-fsf List only packages with FSF compatible licenses
-o, --list-osi List only packages with OSI compatible licenses
-d, --list-DFSG List only packages with DFSG compatible licenses

I created this issue to discuss the implementation, I’d be happy to implement it myself.

Based on the information from spdx/license-list-data I suggest to create a simple licenses table file which we than load in the main program to read the licenses from. E.g., here some licenses:

name	id	FSF	OSI	DFSG	vrms	alternatives
BSD Zero Clause License	0BSD	Y	Y	Y	[`Zero-Clause BSD`]
Beerware License	Beerware	Y	Y	[]
Creative Commons Attribution	CC-BY	Y	[`CCPL:by`]
Creative Commons Attribution 4.0 International	CC-BY-4.0	Y	Y	Y	[`CCPL:by-4.0`]
GNU General Public License v2.0 or later	GPL-2.0-or-later	Y	Y	Y	Y	[`GPL2+`, `GPL2-or-later`, `GPL2 or any later version`]

Technically, I suggest using the json format provided by SPDX and add our own variables for DFSG and alternative IDs, by adding an own json file with content such as:

{
  "licenseListVersion": "2.01",
  "releaseDate": "2023-10-15",
  "licenses": [
    {
      "licenseId": "0BSD",
      "alternativeIds": [
        "Zero-Clause BSD"
      ],
      "isDfsgFree": true,
      "isVrmsFree": true
    },
    {
      "licenseId": "Beerware",
      "isDfsgFree": true,
      "isVrmsFree": true
    },
    {
      "licenseId": "CC-BY",
      "alternativeIds": [
        "CCPL:by"
      ],
      "isVrmsFree": true
    },
    {
      "licenseId": "CC-BY-4.0",
      "alternativeIds": [
        "CCPL:by-4.0"
      ],
      "isDfsgFree": true,
      "isVrmsFree": true
    },
    {
      "licenseId": "GPL-2.0-or-later",
      "alternativeIds": [
        "GPL2+",
        "GPL2-or-later",
        "GPL2 or any later version"
      ],
      "isDfsgFree": true,
      "isVrmsFree": true
    }
  ]
}

If you agree on this approach, I’d be happy to start implementing it.

Edit: DSFG -> DFSG

gardenappl commented 11 months ago

So we have two JSON files: one from SPDX and one of our own, and we merge them at runtime? That sounds good me, I was thinking of doing something similar but never got around to it. In theory we could pull the SPDX one dynamically and cache it.

gardenappl commented 11 months ago

Personally though I'd prefer to keep our data in a simpler format like CSV or TSV, only because then it's easier sort the license list in the source code, and in general since the data is pretty much supplied manually, I think tab-separated values will make for less typing.

gardenappl commented 11 months ago

License ID  DFSG?   vrms?   Aliases...
0BSD    true    true    Zero-Clause BSD
GPL-2.0-or-later    true    true    GPL2+   GPL2-or-later   GPL2 or any later version

is this too hacky or no? I know a bit of jq so if we need to convert this to JSON at some point, that shouldn't be a huge issue.

mahlzahn commented 11 months ago

I also thought of csv or tsv as simpler format, but then I thought that json should be the preferred format to be read with python (without extra packages). Nevertheless, I tried with the following sample free_licenses.tsv file

#ID DFSG?   Aliases
0BSD    True    Zero-Clause BSD
Beerware    True
# CC-BY is ambiguous for versions 1.0, 2.0, etc.
CC-BY       CCPL:by
CC-BY-4.0   True    CCPL:by-4.0
GPL-2.0-or-later    True    GPL2+   GPL2-or-later   GPL2 or any later version

and implemented this little code to read the file

import spdx_license_list as spdx

class License():
    def __init__(self, license_id, dfsg_free=False, *aliases, osi_approved=None, fsf_libre=None):
        self.license_id = license_id
        if type(dfsg_free) == str:
            self.dfsg_free = dfsg_free.lower() in ['true', 'yes', 'y', '1']
        else:
            self.dfsg_free = bool(dfsg_free)
        self.aliases = list(filter(bool, aliases))
        self.osi_approved = osi_approved
        self.fsf_libre = fsf_libre
        if license_id in spdx.LICENSES:
            spdx_license = spdx.LICENSES[license_id]
            if spdx_license.name not in self.aliases:
                self.aliases.append(spdx_license.name)
            if osi_approved is None:
                self.osi_approved = spdx_license.osi_approved
            if fsf_libre is None:
                self.fsf_libre = spdx_license.fsf_libre

with open('free_licenses.tsv') as f:
    for line in f.read().splitlines():
        if line and line[0] != '#':
            print(License(*line.split('\t')).__dict__)

which yields

{'license_id': '0BSD', 'dfsg_free': True, 'aliases': ['Zero-Clause BSD', 'BSD Zero Clause License'], 'osi_approved': True, 'fsf_libre': False}
{'license_id': 'Beerware', 'dfsg_free': True, 'aliases': ['Beerware License'], 'osi_approved': False, 'fsf_libre': False}
{'license_id': 'CC-BY', 'dfsg_free': False, 'aliases': ['CCPL:by'], 'osi_approved': None, 'fsf_libre': None}
{'license_id': 'CC-BY-4.0', 'dfsg_free': True, 'aliases': ['CCPL:by-4.0', 'Creative Commons Attribution 4.0 International'], 'osi_approved': False, 'fsf_libre': True}
{'license_id': 'GPL-2.0-or-later', 'dfsg_free': True, 'aliases': ['GPL2+', 'GPL2-or-later', 'GPL2 or any later version', 'GNU General Public License v2.0 or later'], 'osi_approved': True, 'fsf_libre': True}

Also, I realized that probably we don’t need the field/parameter is_vrms_free because all licenses we add in the tsv file can be considered free for vrms. And if needed we can later add other licenses with separate files.

Edit: I found the very nice and always up-to-date SPDX database for python: https://github.com/JJMC89/spdx-license-list. A bot is automatically pushing always the latest SPDX release and it has all information we need. I incorporated its information in above source code.

gardenappl commented 11 months ago

Should we do anything special with the "ethical" licenses? They could just be an extra field in the TSV.

I know that that's a niche feature and the movement behind it, is... well, it was never tremendously popular. But I do have at least one AUR package on my system which uses the Hippocratic license: https://aur.archlinux.org/packages/zsh-abbr

So I'd rather not exclude them.

gardenappl commented 5 months ago

Should we do anything special with the "ethical" licenses?

I'll just remove the "ethical source" licenses. After looking through the AUR metadata archives, literally nobody in the AUR uses any of those licenses; except for the one aforementioned package using the Hippocratic license, but that license is actually in the SPDX database already.

gardenappl commented 5 months ago

I have some other suggestions regarding this, but I will implement them in a re-write. I'm not sure when exactly I'll end up rewriting vrms, but I need to do that sooner rather than later because of Arch's adoption of SPDX expressions.

gardenappl / vrms-arch

Feature request: Add options to show packages with only-FSF, only-OSI or only-DFSG compatible licenses #8