doubleopen-project / policy-configuration

Double Open license classification for OSS Review Toolkit (ORT) and other uses.
Creative Commons Zero v1.0 Universal
12 stars 5 forks source link

Implement basic checks on classified licenses #64

Closed sschuberth closed 4 months ago

sschuberth commented 7 months ago

Please have a look at the individual commit messages for the details.

Edit: See here for the list of licenses that'd need to be fixed according to this new check.

mmurto commented 7 months ago

Would it make sense to allow deprecated licenses? According to the SPDX license list, "deprecated license identifier, while remaining valid, should no longer be used".

sschuberth commented 7 months ago

Would it make sense to allow deprecated licenses? According to the SPDX license list, "deprecated license identifier, while remaining valid, should no longer be used".

As ORT maps deprecated to non-deprecated licenses, at least when used with ORT it should be fine to only keep non-deprecated IDs, I believe.

mmurto commented 7 months ago

As ORT maps deprecated to non-deprecated licenses, at least when used with ORT it should be fine to only keep non-deprecated IDs, I believe.

I believe this would not help in a situation where an end user uses a version of ORT before SPDX deciding on deprecation and ORT including the conversion for such a license. They would just see a new unhandled license violation in their report, while still having totally valid SPDX, although with a deprecated license ID. The list of deprecated licenses is currently quite short at 20-30 license IDs, so this would not really swamp the classification in deprecated IDs.

sschuberth commented 7 months ago

I believe this would not help in a situation where an end user uses a version of ORT before SPDX deciding on deprecation and ORT including the conversion for such a license.

That's true theoretically, but I wonder how this could practically happen. Using such an old version of ORT probably has completely different issues, like breaking changes in the serialization of license-classifications.yml, so you could not even read that file anymore anyway 😉

sschuberth commented 7 months ago

Also note that some deprecated licenses were actually already removed as part of https://github.com/doubleopen-project/policy-configuration/pull/35/commits/f8074d9c54c52e95d47fc37b0282476f4e198137.

mmurto commented 7 months ago

Also note that some deprecated licenses were actually already removed as part of f8074d9.

Your right - maybe we're good to go to delete the deprecated licenses, but keep an ear open if someone has issues due to us deleting them.

sschuberth commented 7 months ago

Your right - maybe we're good to go to delete the deprecated licenses, but keep an ear open if someone has issues due to us deleting them.

I'd agree with that approach.

Also, I was thinking whether we should in fact allow to classify exceptions on their own. I've always argues against that, but actually there's no code in ORT that demands an id of a classification to be a full / compound SPDX expression with an exceptions.

The advantage to classifying exceptions on their own would be to only classify their obligations / implications / semantics once, and then match them to full license expressions as part of the programmatic rules, instead of classifying each permutation of an exceptions with all its applicable licenses.

willebra commented 7 months ago

I'm a bit unsure, whether all exceptions can be handled programmatically, would need to look at the combinations to verify that. So there's a risk that some would need to continue to be classified as "licence with exception". If we can maintain both approaches, and once we understand the programmatic approach works for a given set of combinations, we could use that approach, otherwise the old approac, then ok.

sschuberth commented 6 months ago

To clarify, we have various different levels of strictness we could apply to license checks. As part of the classification, we could accept strings like (any given license exception is optional in the following examples):

  1. "any random string", i.e. accept anything, no matter whether it syntactically is a valid license, or semantically makes sense.
  2. "ValidLicenseName WITH OptionalException", i.e. expressions that have valid grammar syntax, but that may be invalid in the sense that licenses / exceptions do not have to be part of the SPDX license / exception list.
  3. "GPL-2.0 WITH Classpath-exception-2.0", i.e. accept only syntactically and semantically valid SPDX licenses, but allow deprecated identifiers.
  4. "GPL-2.0-only WITH Classpath-exception-2.0", i.e. accept only syntactically and semantically valid SPDX licenses, and do not allow deprecated identifiers.
  5. "GPL-2.0-only WITH LicenseRef-custom-exception", i.e. accept only syntactically and semantically valid SPDX licenses, and do not allow deprecated identifiers, but allow LicenseRefs as exception strings.

Note that non of 3. - 5. do (currently) check whether the exception, if any, actually can be used with the given license.

At a minimum, I'd implement check 2., which is what I'll update this PR to.

sschuberth commented 6 months ago

@willebra, @mmurto, @Toniprni, how do we move forward here? We need a decision on whether we want to allow the classification of stand-alone exceptions, and generally how strict the license expression validation should be. Today with @Etsija and @lamppu we were stumbling over this again as it impacts front-end UI matters when pre-selecting licenses.

I read @willebra's comment so that he prefers to have a "transition period" where we allow to classify both stand-alone licenses, and license expression with exceptions in their valid license-exception-permutations. That's technically doable, though maybe a bit "unclean", unless we can agree e.g. on a new dedicated property like is:stand-alone-exception that should replace the not very telling property:AutoConf-or-other-exception. (If it can be any exception, why is "Autoconf" even mentioned?)

So, should we start with renaming / replacing property:AutoConf-or-other-exception with is:stand-alone-exception?

mmurto commented 6 months ago

I'd go with removing the stand-alone exceptions. It will cause some unhandled licenses in the used policies, but they can be easily fixed in the rules. I don't see it preventing us from handling exceptions programmatically.

sschuberth commented 6 months ago

I don't see it preventing us from handling exceptions programmatically.

Indeed it doesn't. It just might create "configurational overhead" by requiring us to classify each license / exception permutation, but on the other hand it's probably good to be explicit in the configuration already.

sschuberth commented 4 months ago

Conclusions from our meeting on the rules for licenses:

sschuberth commented 4 months ago

This is not ready for merging yet as the existing code first needs to be cleaned up to pass the checks. (Which should be done in preceding commits in this PR to ensure that the actual checks work as expected.)

sschuberth commented 4 months ago

the existing code first needs to be cleaned up to pass the checks.

Actually, this is not true... I probably was mis-remembering something about the strictness of the (preliminary) checks being implemented. So @mmurto, please review!

sschuberth commented 4 months ago

@Etsija, opening the "scripts" folder in Fleet should now also work to get auto-completion for .main.kts scripts thanks to the first commit in this PR.