Open pombredanne opened 2 years ago
I believe we're running into the same issue with ScanCode 32.0.8 scanning this 3RD-PARTY-NOTICES.txt file, which yields this scancode-result.json that contains the expression
"gpl-2.0 AND classpath-exception-2.0"
in several places.
Strict SPDX expression parsers like the one in ORT will fall over this (even after mapping ScanCode license keys to SPDX license IDs) as classpath-exception-2.0
(or its SPDX equivalent Classpath-exception-2.0
with upper-case "C") is not a stand-alone license, but an exception that must only be used as a right-hand operand to the WITH
operator.
We actually have some post-processing of license findings in ORT to fix this up, but it's non-trivial to get this right for cases with long / nested AND
/ OR
expressions in which the license name and belonging exception name are not listed next to each other.
So obviously, it would be best to get this fixed upstream in ScanCode itself.
Related issue: proposed new attributes for scancode licenses: https://github.com/nexB/scancode-toolkit/issues/3484
After this is the case and we have mappings like this in the LICENSE files (for example a pointer to gpl-2.0
in the classpath-exception-2.0
license exception), then we would add a step in the license detection post-processing. But this is TBD and is still being discussed if this is the right way.
There is also the related issue of LicenseRef vs AdditionRef discussion in SPDX 3.0 which is related to this, and needs discussion.
@sschuberth in the ORT implementation, if I understand correctly you do not have a mapping of exceptions like this to associate the exceptions with the licenses, but do it just based on proximity right?
We should also update rules to have a single rule for gpl-2.0 WITH classpath-exception-2.0
so this is treated correctly meanwhile.
if I understand correctly you do not have a mapping of exceptions like this to associate the exceptions with the licenses, but do it just based on proximity right?
No, it's not only proximity, but we also take our exception mapping into account to create valid license-exception combinations.
@sschuberth Thanks for the report!
We actually have some post-processing of license findings in ORT to fix this up, but it's non-trivial to get this right for cases with long / nested AND / OR expressions in which the license name and belonging exception name are not listed next to each other.
IMHO you should report ALL AND ANY license detection issue here (and request any ORT user to do so too). Otherwise, there are no improvements possible.
Now on the funny side, it is 100% clear to me that https://github.com/nordic-institute/X-Road/blob/0f04331e2675428a25d37aee735686cd22bc4e16/src/3RD-PARTY-NOTICES.txt was generated in part using ScanCode. This observation is based on the copyright and license reported where I spotted a few specific behaviors that are the clues that this was done with ScanCode. All the license texts are also matched exactly to ScanCode license texts, and all copyrights are normalized as ScanCode normalizes. I could likely even find which version of ScanCode they used.
Now, in this specific case, I surmise that they actually generated the attribution from ScanCode and assembled side-by-side the GPL and Classpath exception reference texts from ScanCode itself that we then detect together side-by-side in the same file.
We could simply do as Ayan suggested with a new rule and there is a circular danger there: this is assembled from ScanCode in a peculiar manual way and adding more rules should be done carefully as this could spiral!
We could also have a more specific way to handle exceptions and their excepted licenses as a new "detection".
There is also the related issue of LicenseRef vs AdditionRef discussion in SPDX 3.0 which is related to this, and needs discussion.
For reference, that's this discussion. TL;DR, SPDX 3.0 will use the AdditionRef-
prefix for right-hand side operands to WITH
that are not core exceptions.
L;DR, SPDX 3.0 will use the AdditionRef- prefix for right-hand side operands to WITH that are not core exceptions.
FWIW, I was very much against this wart that provides no value that I can fathom, but hey! we will adapt.
IMHO you should report ALL AND ANY license detection issue here (and request any ORT user to do so too). Otherwise, there are no improvements possible.
No offense @pombredanne, but this issue has been open for two years (reported by you) and there were no improvements still, so it's clearly not a matter of lacking examples, but a lack of time / prioritization. (Which is ok.)
I could likely even find which version of ScanCode they used.
They use the ScanCode version that ORT uses 😉 (big wink)
We could also have a more specific way to handle exceptions and their excepted licenses as a new "detection".
Yes please. This should be solved generically instead of hard-coding a rule for this specific case. Exceptions to licenses simply never should be reported as licenses on their own, i.e. without the WITH
operator.
L;DR, SPDX 3.0 will use the AdditionRef- prefix for right-hand side operands to WITH that are not core exceptions.
FWIW, I was very much against this wart that provides no value that I can fathom, but hey! we will adapt.
I agree. AdditionRef-
is a pretty much ~stupid~ unspecific term.
@sschuberth re:
this issue has been open for two years (reported by you) and there were no improvements still, so it's clearly not a matter of lacking examples, but a lack of time / prioritization.
Actually the lack of prioritization has been mostly a matter of lack of examples and reported interest, until now.
Exceptions to licenses simply never should be reported as licenses on their own, i.e. without the WITH operator.
I am not sure this is can be done blanket, as this will certainly under or mis-report some GPLs as having an exception when they also apply without.
Ignoring this for a sec, here is a revised approach from the one listed above in https://github.com/nexB/scancode-toolkit/issues/2855#issue-1125812267 reworded this based on the current state:
Use the LicenseDetection approach as a new detection rule https://github.com/nexB/scancode-toolkit/blob/f70bbb7d9d9bab40a9d504e664bc945b6a1630e8/src/licensedcode/detection.py#L116
Tag all license exceptions records in the license db (such as https://scancode-licensedb.aboutcode.org/?search=exception ) with the list of license keys that they would typically except. For this, use a new attribute named exception_to
that would contain a list of license keys. For instance, something along these lines:
key: classpath-exception-2.0
is_exception: yes
....
exception_to:
- gpl-1.0
- gpl-1.0-plus
- gpl-2.0
- gpl-2.0-plus
- gpl-3.0
- gpl-3.0-plus
When we have a detected solo exception preceded by a match to a license tag or notice of its excepted license then we could return the combined expression as a new detection from the two separate matches.
We could extend this approach to a few other match sequences like license text followed by exception text.
Another consideration to research: what is the resulting license category of the combined "license with exception" expression or that a of sub expression in a larger complex expression. See also https://github.com/nexB/scancode-toolkit/issues/2897
here is a revised approach from the one listed above
Sounds good to me.
what is the resulting license category of the combined "license with exception" expression
That's a topic also @willebra might be interested in.
There are cases where a notice may not be amenable to a clean detection such as when we would have:
both detected separately. Yet these we could instead report a single
gpl-2.0 WITH classpath-exception-2.0
.To achieve this we should IMHO: