Open mahlzahn opened 1 year ago
So we have two JSON files: one from SPDX and one of our own, and we merge them at runtime? That sounds good me, I was thinking of doing something similar but never got around to it. In theory we could pull the SPDX one dynamically and cache it.
Personally though I'd prefer to keep our data in a simpler format like CSV or TSV, only because then it's easier sort
the license list in the source code, and in general since the data is pretty much supplied manually, I think tab-separated values will make for less typing.
License ID DFSG? vrms? Aliases...
0BSD true true Zero-Clause BSD
GPL-2.0-or-later true true GPL2+ GPL2-or-later GPL2 or any later version
is this too hacky or no? I know a bit of jq
so if we need to convert this to JSON at some point, that shouldn't be a huge issue.
I also thought of csv
or tsv
as simpler format, but then I thought that json should be the preferred format to be read with python (without extra packages). Nevertheless, I tried with the following sample free_licenses.tsv
file
#ID DFSG? Aliases
0BSD True Zero-Clause BSD
Beerware True
# CC-BY is ambiguous for versions 1.0, 2.0, etc.
CC-BY CCPL:by
CC-BY-4.0 True CCPL:by-4.0
GPL-2.0-or-later True GPL2+ GPL2-or-later GPL2 or any later version
and implemented this little code to read the file
import spdx_license_list as spdx
class License():
def __init__(self, license_id, dfsg_free=False, *aliases, osi_approved=None, fsf_libre=None):
self.license_id = license_id
if type(dfsg_free) == str:
self.dfsg_free = dfsg_free.lower() in ['true', 'yes', 'y', '1']
else:
self.dfsg_free = bool(dfsg_free)
self.aliases = list(filter(bool, aliases))
self.osi_approved = osi_approved
self.fsf_libre = fsf_libre
if license_id in spdx.LICENSES:
spdx_license = spdx.LICENSES[license_id]
if spdx_license.name not in self.aliases:
self.aliases.append(spdx_license.name)
if osi_approved is None:
self.osi_approved = spdx_license.osi_approved
if fsf_libre is None:
self.fsf_libre = spdx_license.fsf_libre
with open('free_licenses.tsv') as f:
for line in f.read().splitlines():
if line and line[0] != '#':
print(License(*line.split('\t')).__dict__)
which yields
{'license_id': '0BSD', 'dfsg_free': True, 'aliases': ['Zero-Clause BSD', 'BSD Zero Clause License'], 'osi_approved': True, 'fsf_libre': False}
{'license_id': 'Beerware', 'dfsg_free': True, 'aliases': ['Beerware License'], 'osi_approved': False, 'fsf_libre': False}
{'license_id': 'CC-BY', 'dfsg_free': False, 'aliases': ['CCPL:by'], 'osi_approved': None, 'fsf_libre': None}
{'license_id': 'CC-BY-4.0', 'dfsg_free': True, 'aliases': ['CCPL:by-4.0', 'Creative Commons Attribution 4.0 International'], 'osi_approved': False, 'fsf_libre': True}
{'license_id': 'GPL-2.0-or-later', 'dfsg_free': True, 'aliases': ['GPL2+', 'GPL2-or-later', 'GPL2 or any later version', 'GNU General Public License v2.0 or later'], 'osi_approved': True, 'fsf_libre': True}
Also, I realized that probably we don’t need the field/parameter is_vrms_free
because all licenses we add in the tsv
file can be considered free for vrms
. And if needed we can later add other licenses with separate files.
Edit: I found the very nice and always up-to-date SPDX database for python: https://github.com/JJMC89/spdx-license-list. A bot is automatically pushing always the latest SPDX release and it has all information we need. I incorporated its information in above source code.
Should we do anything special with the "ethical" licenses? They could just be an extra field in the TSV.
I know that that's a niche feature and the movement behind it, is... well, it was never tremendously popular. But I do have at least one AUR package on my system which uses the Hippocratic license: https://aur.archlinux.org/packages/zsh-abbr
So I'd rather not exclude them.
Should we do anything special with the "ethical" licenses?
I'll just remove the "ethical source" licenses. After looking through the AUR metadata archives, literally nobody in the AUR uses any of those licenses; except for the one aforementioned package using the Hippocratic license, but that license is actually in the SPDX database already.
I have some other suggestions regarding this, but I will implement them in a re-write. I'm not sure when exactly I'll end up rewriting vrms, but I need to do that sooner rather than later because of Arch's adoption of SPDX expressions.
Idea (see, also #5) by @im397:
I created this issue to discuss the implementation, I’d be happy to implement it myself.
Based on the information from spdx/license-list-data I suggest to create a simple licenses table file which we than load in the main program to read the licenses from. E.g., here some licenses:
Zero-Clause BSD
]CCPL:by
]CCPL:by-4.0
]GPL2+
,GPL2-or-later
,GPL2 or any later version
]Technically, I suggest using the json format provided by SPDX and add our own variables for DFSG and alternative IDs, by adding an own json file with content such as:
If you agree on this approach, I’d be happy to start implementing it.
Edit: DSFG -> DFSG