clearlydefined / license-score

Documentation and samples to compute a license clarity score report e.g. are we ClearlyLicensed yet?
MIT License
2 stars 8 forks source link

Full texts scoring element #2

Open pombredanne opened 6 years ago

pombredanne commented 6 years ago

We may need an extra "definition" data element for full license texts to flag if a file contain a given full license text used in the License Text scoring elements : And may be we could only consider a full text as relevant only if they are also a top, key file?

pombredanne commented 6 years ago

So I discussed this with @jeffmcaffer and @fossygirl during an open call last week.

  1. it makes a lot of sense to require a full text to be one of the top level key files for this scoring element. I am submitting a PR to update the scoring definition with this.

  2. In any case we cannot have this scoring element work unless we know that a file contains the full license text of a license and which license. This has to be a data point that is tracked in the definition data.

There are a couple way to model it, the simplest that comes to mind would be this: Something like a contains_texts_for_license attribute at the file level that would be a list. The items would be license ids. The meaning of this attribute would be when present: this file contains the full texts for the listed licenses. The attribute name can be made better of course, just a suggestion.

And the updated way this binary scoring element points would be granted would be this way:

@jeffmcaffer @fossygirl @tieguy @DennisClark @mjherzog .. feedback welcomed

pombredanne commented 6 years ago

For reference, this is the current definition of the "License Texts" scoring element:

License Texts: This is to capture the presence of full license texts for any referenced licenses (vs. notice or mere mentions) found in a package.

Most open source licenses require to make available their full license text. Some packages may contain only license names, a license tag, a license identifier (such as an SPDX license identifier) or a license notice text but they may not contain the complete corresponding license text. Metrics scoring is higher for a software package that contains the full license texts because the absence of such text would require users to fetch these texts separately. Also there could be some ambiguity if the full text is not provided.

Is a copy of the complete license text available in the project for every referenced license? This is based on files in the core facet. This is a binary score element awarded if the package contains the full license text for all the licenses found anywhere in the core facet.

pombredanne commented 6 years ago

Here is the proposed new definition of this element where I have added this: ...top-level, key files...

License Texts

This scoring element is awarded if there is copy of the full license text available in the project top-level key files for every referenced license found in the core facet. This is a binary score element.