Open pombredanne opened 4 years ago
Primarized is not a word. Just Primary would work or perhaps PRIMARY to make it prominent.
Let me explain further: Say I start from these expressions for a package:
bsd-new AND mit
bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0
I would like a way to end with this combining the two expressions above with AND
but keeping track of the fact one expression is the primary one. For instance, the primary license could be the one provided in an RPM spec file or an npm package.json and may not cover all the file-level details with less important but still important secondary licenses.
bsd-new AND mit AND bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0
but this is really not helping to track what the original top-level license was.(bsd-new AND mit) AND (bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0)
as they do not change the meaning of the expression... but since these are optional they would be easy to drop when parsing and printing back an expression.@DennisClark came with a brilliant idea which would be to use a new keyword in the expression syntax PLUS
that would be strictly equivalent to AND
If we go with that we would get bsd-new AND mit PLUS bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0
and things are very clear now: the left hand side is the primary license and the ight hand side is the secondary license.
There is no loss of meaning and no risk to drop the PLUS
and furthermore it reads nicely.
The rule would be that there could be zero or only one PLUS
keyword in an expression and that PLUS
is strictly equivalent to AND
. When sorting, simplifying or minimizing an expression with a PLUS
, the left hand side (LHS) and right hand side (RHS) would be processed separately. And there could be a convenience method to cast back a PLUS
to a simple AND
.
With this simple enhancement the expressive power would be vastly enhanced.
Note that this would be entirely optional and this would mean that whenever client code that uses the license-expression library deal with SPDX expressions, then they would request casting a possible PLUS
back to AND
so this is correct SPDX-wise. And if this PLUS
proves as useful as I think it will, then we would submit this as an enhancement to the official SPDX syntax.
@qduanmu @mjherzog @sschuberth @mbargull @DennisClark @tdruez @pkolbus @carmenbianca @mxmehl @majurg @majurg @chinyeungli @johnmhoran FYI
Just adding further about why this is useful... it is super common to have multiple licenses for a package, but all these licenses do not have the prominences. For instance, a package may be using the GPL for its command line utilities and the LGPL for a library as core, top level licenses (common for Linux userland tools) and still harbor bits of code under BSD and MIT licenses and have its build script under yet another license and its documentation under GFDL. All these licenses need to be reported alright, but in doing so and combining them all in a single license expression we may lose sight that the key, core licenses are GPL and LGPL and that the other licenses are there but secondary.
@pombredanne, thanks for the FYI.
If a license is applicable then you have to abide by the terms to use the bit of code, so understanding the entire license expression is necessary for compliance purposes. Because of this, I'm not seeing the use case that leads to prominence being a necessary concept. (Admittedly, this could well be a lack of vision on my part.)
Taking the example of a declared license vs scan results a bit further, I'm not sure it's appropriate to combine in that way, as there's a difference in the level of confidence or possibly legal interpretation applied. It's possible that:
LGPLv2.1 AND GPLv2
might have been simplified to GPLv2
)Note also that SPDX v2.2.0 (https://spdx.github.io/spdx-spec/) is consistent with this separation as it defines multiple license-expression fields: "Concluded License" (3.13), "All Licenses From Files" (3.14), and "Declared License" (3.15).
But if prominence does in fact have value: treatment of the left and right sides of the PLUS as independent leads to expressions that are unnecessarily complex. (bsd-new AND mit PLUS bsd-new AND bsd-simplified
could easily be bsd-new AND mit PLUS bsd-simplified
.) And while the PLUS->AND transform does enable simplification, there is an irreversible loss of prominence data. Assuming the concept that prominence is roughly prevalence, I would suggest that simplifications across the PLUS are valid, but those involving a prominent sub-expression (the left-hand side of PLUS) result in a prominent sub-expression. (For example, if GPLv2+ AND GPLv3
simplifies to GPLv3
then GPLv2+ PLUS GPLv3 AND MIT
simplifies to GPLv3 PLUS MIT
.)
@DennisClark came with a brilliant idea which would be to use a new keyword in the expression syntax
PLUS
that would be strictly equivalent toAND
I'm sorry to spoil the party here, but if I may be frank, I believe this is not a good approach. Because:
Note that this would be entirely optional and this would mean that whenever client code that uses the license-expression library deal with SPDX expressions, then they would request casting a possible
PLUS
back toAND
so this is correct SPDX-wise.
So strictly speaking, this breaks SPDX compatibility, which IMO is an absolute no-go. Third-party application must be able to rely on being able to parse the expression if they adhere to the SPDX standard.
If you have a hard requirement to track primary / declared vs. other licenses you really should use different fields like e.g. we do in ORT (and SPDX itself does like @pkolbus mentioned), and only create a combined license expression on the fly on license evaluation. Or come up with a convention that does not break the standard, like using parentheses as @pombredanne suggested before (maybe extend that idea to use "dummy" double-parentheses to avoid confusion with regular parentheses).
@sschuberth re:
So strictly speaking, this breaks SPDX compatibility, which IMO is an absolute no-go. Third-party application must be able to rely on being able to parse the expression if they adhere to the SPDX standard.
Actually it would not break compatibility: the PLUS
would not be used for SPDX expressions, but only for scancode and aboutcode expressions using non-SPDX license keys
Furthermore, I floated the idea to add this to SPDX and I would submit this for consideration there too.
but only for scancode and aboutcode expressions using non-SPDX license keys
Ideally, there would be no such expressions. Non-SPDX license keys should become SPDX LicenseRef
s, and everything that looks like an SPDX expression should actually be one. Just my 2 cents.
@pkolbus Thank you for the detailed feedback!
For example, if GPLv2+ AND GPLv3 simplifies to GPLv3 then GPLv2+ PLUS GPLv3 AND MIT simplifies to GPLv3 PLUS MIT.)
It is important to note that this GPLv2+ AND GPLv3 simplifies to GPLv3
is NEVER true (at least that's not a license-expression library feature, though you could implement this with substitutions. The simplification done here is a logical/boolean simplification based on symbol (e.g. license keys) and operators (AND, OR and WITH).
If we add support for PLUS
then we would treat each sides to the left and to the right of the PLUS
operator as separate expressions tat would be simplified separately (and would not be mixed, except possibly to ensure that they are disjoint)
So
bsd-new AND mit PLUS bsd-new AND bsd-simplified
would unlikely change or if it does i may be optionally to bsd-new AND mit PLUS bsd-simplified
GPLv2+ PLUS GPLv3 AND MIT
would not be simplified@sschuberth
Ideally, there would be no such expressions. Non-SPDX license keys should become SPDX LicenseRefs, and everything that looks like an SPDX expression should actually be one. Just my 2 cents.
Agreed, and we should likely move ahead in that direction since https://github.com/spdx/spdx-spec/issues/113 seems to be stalled... but that would not remove the value to distinguish the "main license" vs. the rest IMHO
but that would not remove the value to distinguish the "main license" vs. the rest IMHO
Here I agree, too, but IMO the best approach to document such a main / primary license would still be a dedicated field / property.
@sschuberth re:
Here I agree, too, but IMO the best approach to document such a main / primary license would still be a dedicated field / property.
but then this is not one but eventually an array of license expressions that would be needed to be correct, eventually grouping each files that share a "purpose" together (say doc, build scripts, tests, dev tools, dead code, etc.) ?
Yes, maybe something similar to ClearlyDefined's facets: https://github.com/clearlydefined/service/blob/b339cb7e281c1e35990b685efe9bfb774d9cd22f/schemas/definition-1.0.json#L151-L161
@sschuberth There does not seem to be any activity with CD Facets, but the concept is similar. This has also been a long-standing topic at SPDX, but no conclusion afaik. You could argue that "Relationships between SPDX Elements" capture some of this, but that is much more complex than this use case.
@sschuberth You wrote:
but that would not remove the value to distinguish the "main license" vs. the rest IMHO
Here I agree, too, but IMO the best approach to document such a main / primary license would still be a dedicated field / property.
Well, the thing is that it would assume that anywhere we use a single license expression string we now would need two license expressions to convey this notion of primary and secondary.
I think that this would not be practical when license expressions are used outside of SPDX documents: there we have no control on the schema and adding new fields is unlikely to happen IMHO.
Working towards increased adoption of using a license expression rather than an unstructured license string in a package manager metadata field is already a significant piece of work. Asking folks to break things down in multiple fields feels like an even more difficult or impossible task to me.
You also wrote:
Yes, maybe something similar to ClearlyDefined's facets:
yes, conceptually. But that's also transforming the "license-expression-as-a-single-string" into "license-expression-as-a-mapping-of-key-value-pairs" which would be impractical to be adopted by many package managers tools and other places where a license expression string may be used
Say I start from these expressions:
I would like a way to end with this combining the two expressions above with AND