aboutcode-org / license-expression

Utility library to parse, normalize and compare License expressions for Python using a boolean logic engine. For expressions using SPDX or any other license id scheme.
http://aboutcode.org
Other
56 stars 23 forks source link

Provide built-in support for SPDX and scancode license expression validation #56

Open pombredanne opened 3 years ago

pombredanne commented 3 years ago

I would like to have a function that takes an expression string as an argument and validates this expression. It could be build from Licensing.parse() but I would prefer having it return some object that tells me everything about the expression validity:

This function should be taking either the ScanCode license DB as an input for license symbols ( https://scancode-licensedb.aboutcode.org ) or some list of symbols. It should bundle an up-to-date licenses list from ScanCode and SPDX for easy bootstrapping. For this we need https://github.com/nexB/scancode-licensedb/issues/7 In addition it should also support and accept arbitrary LicenseRef- (and possibly DocumentRef- ) in SPDX mode.

pombredanne commented 3 years ago

@thatch @JonoYang ping ^

pombredanne commented 3 years ago

Some example:

$ wget https://scancode-licensedb.aboutcode.org/index.json
$ python
>>> import json
>>> lics = json.load(open('index.json'))
>>> lics[0]
{'license_key': '389-exception', 'json': '389-exception.json', 'yml': '389-exception.yml', 'html': '389-exception.html', 'text': '389-exception.LICENSE'}
>>> from license_expression import LicenseSymbol, Licensing
>>> syms =[LicenseSymbol(l['license_key']) for l in lics] 
>>> ling=Licensing(symbols=syms)
>>> ling.parse('foo AND mit')
AND(LicenseSymbol('foo', is_exception=False), LicenseSymbol('mit', is_exception=False))
>>> ling.parse('foo AND mit', validate=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/licexp/tmp/lib/python3.6/site-packages/license_expression/__init__.py", line 453, in parse
    raise ExpressionError(msg)
license_expression.ExpressionError: Unknown license key(s): foo
>>> e=ling.parse('foo AND mit')
>>> e.symbols
{LicenseSymbol('foo', is_exception=False), LicenseSymbol('mit', is_exception=False)}
JonoYang commented 3 years ago

@pombredanne When we are parsing a license expression using Licensing().parse(), should the .parse() method be automatically able to determine whether or not an expression is an SPDX license expression or a scancode license expression or should there be a flag that tells the .parse() method what kind of license expression to expect?

pombredanne commented 3 years ago

@JonoYang I think the new validation feature should be explicit about which license list is used as a base and there should be no guessing there about whether an expression is from scancode or from SPDX.

thatch commented 3 years ago

In addition to validation, could you also provide a normalized (whitespace, case, parens) version of the string passed in?