OpenEnergyPlatform / academy

The Open Energy Academy is a collection of courses, tutorials, and questions for the Open Energy Family
https://openenergyplatform.github.io/academy/
GNU Affero General Public License v3.0
16 stars 6 forks source link

Inconsistent identification of licenses in sources #55

Closed MGlauer closed 5 years ago

MGlauer commented 5 years ago

The current metadata version seems to have two ways to identify licenses:

  1. "<metadata.licenses[*].name>"
  2. "<licence.title> (<license.name>)"

I have the feeling all sources should just use license.name as an identifier in order to maintain machine readability

Ludee commented 5 years ago

There are licenses which are not included in the SPDX list. e.g. dl-de-by-2.0 Another list with licenses: https://github.com/fraunhoferfokus/ogd-metadata/blob/master/lizenzen/deutschland.json

Ludee commented 5 years ago

from https://frictionlessdata.io/specs/data-package/

licenses The license(s) under which the package is provided.

This property is not legally binding and does not guarantee the package is licensed under the terms defined in this property.

licenses MUST be an array. Each item in the array is a License. Each MUST be an object. The object MUST contain a name property and/or a path property. It MAY contain a title property.

Ludee commented 5 years ago

The "title" is very useful field for humans. It is not needed by the data package standard but still important.

MGlauer commented 5 years ago

Yes, title and name should be there , but as individual fields.

Source licenses use a mixture of both:

"license": "Creative Commons Zero v1.0 Universal (CC0-1.0)"

Which is not good for machine readability and for standard conformance. My proposal was to use an atomic identifier, I.e. just "CC0-1.0". A verbose title can still be optional.

On 4 July 2019 16:23:39 GMT+02:00, "Ludwig Hülk" notifications@github.com wrote:

The "title" is very useful field for humans. It is not needed by the data package standard but still important.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/OpenEnergyPlatform/examples/issues/55#issuecomment-508500249

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

christian-rli commented 5 years ago

Source licenses are indeed not machine readable. While complete machine readability would've been nice we expect that within sources a plethora of potential licenses might show up that cannot all be covered in spdx or similar lists. So we thought it best to treat source licenses as comments. Do you think it should be named differently to signify that quality?

Ludee commented 5 years ago

I suggest to rename the field to "licenseId" and not use the full name in sources. This comes from the SPDX file. Then we need to adapt the description in the wiki:

License of the source. Complete name and license id in brackets. Standard: SPDX License List

Then we have the two license ID in the string: sources/licenseID and licenses/name. Does this make sense?

christian-rli commented 5 years ago

I'm not sure I fully unterstand your suggestion @Ludee . Are you suggesting to use the following:

"sources": [
    {"licenseID": "CDDL-1.0"}],

and

"licenses": [
    {"name": "CDDL-1.0"}],

?

I don't think this would make sense, because there would be two different keys expecting the same kind of input. Also, it wouldn't consider the problem of having source input of unknown licenses and it would get rid of our human readable license description in licenses.

To me it would make more sense to keep

"sources": [
    {"license": "comment style license description"}],

or to rename "license" into something like "license_comment" to signal that no machine readable string is required here. Also, I think it would make sense to repurpose "name" in licenses to show a descriptive name (like "title" did before) and use licenseID as in the example you linked:

"licenses": [
    {"licenseID": "CDDL-1.0",
    "name": "Common Development and Distribution License 1.0",}],

What do you think?

Ludee commented 5 years ago

Exactly, this is what I was trying to say. But:

"licenses": The object MUST contain a name property and/or a path property. It MAY contain a title property.

An additional thought is to remove the license information and add a link to the "metadata of the source". That would prevent having double information. But the original idea was to include all licenses (and copyright attribution) in the metadata. If the metadata strings are connected (like an FK) we won't need that any more. "sourceId" -> "id"

"sources": [ {"title": "", "description": "", "path": "", "sourceId": ""}],

christian-rli commented 5 years ago

I don't see the problem, to be honest. There is a name property in licenses in the suggestion. Is it defined to mean something like licenseID? Then I'd see the conflict.

I'm not completely opposed removing and linking license information, but not a fan either. I think it might be metaOverkill and just sticking to a simple json-string is likely the simpler/cleaner option.

christian-rli commented 5 years ago

Addressed this issue in #58 . There is now the same list of licenses fields within sources as there is in "licenses" itself.