SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

Licence Controlled Vocabulary to be recommended for use with `dct:license` #209

Closed H-a-g-L closed 4 weeks ago

H-a-g-L commented 2 years ago

Issue #34 mentions EU Publications Office's Licence Authority Table as a possible recommended vocabulary to be used for dct:license of dcat:Distribution. A quick look at values for this property (range dct:LicenseDocument) on data.europa.eu shows little homogeneity. To increase interoperability, we propose to:

For reference see also https://github.com/w3c/dxwg/issues/114 and DCAT License and Rights note to increase interoperability.

bertvannuffelen commented 2 years ago

@ODP-hil I think your analysis is of interest to the community. I did a quick count and found around 500K different licence instances.

According your analysis query many of the licences have no URI or a dataset specific URI. For those I suggest that you contact the harvested portal(s) to use proper defined, even locally defined, dereferenceable URIs.

bertvannuffelen commented 2 years ago

Some personal reflections on your request on the use of "a recommended codelist" to stimulate the discussion by the community.

W.r.t. to further harmonisation we enter here a local legislation dependency. In Flanders (Belgium), there is legislation about which license documents are to be used. These documents are published and maintained by the responsible government. It might be expected that other MS have the same situation. Of-course many of these somehow correspond to existing data licenses such as creative commons, but worded in the legislative context of the MS.

The challenge with licences is that it is not similar to a theme or a category expressed in metadata for improving discovery, but that the value must be a value in legislative context. It is as the property range states a document describing the rights and obligations when (re)using the data (service). Since each MS has its own legislative framework, one cannot enforce the use of a EU code list as it won't cover each MS case. Moreover, if one includes commercial data suppliers (e.g. in the context of the data spaces) than the variety of the licenses might even increase.

So instead of driving towards a recommended codelist, my recommendation would be to define what is a quality description of a licence document, and provide for the most common ones, for instance creative commons, a recommended approach to use them. In the latter a EU codelist could play a role, but also agreements such as the URIs should be the language neutral ones instead of the language aware versions.

As such are 500K different licence documents for me not problematic as long they are well defined (so no plain strings, or contextless uuids) and refer to a document that can be consulted, preferably identified with a persistent URI. That is for me base quality. One quality level on top of that is that these license documents are metadata described so that the individual differences (such as the name that must be used in acknowledgements) are abstracted away. And a third quality level would be to have the licence document described as ODRL machine readable expressions.

I think the first quality level is really achievable, but cannot be done from the DCAT-AP specification. It are the Open Data portals, that must stimulate each other to do this. So the DEU could in this case put a roadmap with harvested MS ODPs to resolve the issue in a reasonable time. And even consider that after a certain grace period that data that refers to bad quality legal information (e.g. to a string) is not anymore included in the DEU catalogue. This sounds maybe harsh, but in the end, the bad quality licence information renders the datasets useless, as it refers to non-retrievable legal information.

The second quality level is probably something DCAT-AP could contribute to.

The third is back again beyond DCAT-AP: it is for the legal experts to express each document as machine readable expressions. I dream of this, but so far I have not seen any MS to put systematic effort in this.

As you see, these reflections make me a bit hesitant to blindly adopt the Licence Authority Table, as it does not addresses all aspects of your analysis. I even expect if we would add it as recommended the situation wouldn't be changed in a year's time, because this is about (MS) legislation alignment and not metadata alignment.

H-a-g-L commented 2 years ago

Thanks @bertvannuffelen for the insightful considerations. Indeed the issue is very complex with many national variations (see also https://github.com/SEMICeu/DCAT-AP/issues/97 where dct:license is not readily applicable).

500K different licence documents for me are not problematic as long they are well defined (so no plain strings, or contextless uuids) and refer to a document that can be consulted, preferably identified with a persistent URI.

Agree. See also FAIR principles stating that “the conditions under which the data can be used should be clear to machines and humans.”

One quality level on top of that is that these license documents are metadata described so that the individual differences (such as the name that must be used in acknowledgements) are abstracted away

Here the OP authority list (which includes most EU data licences) could be of help because it enriches the licence with many elements such as responsible Agent, skos:exactMatch, cc:requires etc. For example:

<skos:Concept rdf:about=http://publications.europa.eu/resource/authority/licence/CC_BYSA_3_0_NL at:deprecated="false">
    <rdf:type rdf:resource="http://publications.europa.eu/ontology/euvoc#Licence"/>
    <dc:identifier>CC_BYSA_3_0_NL</dc:identifier>
    <skos:definition xml:lang="en">Published by Creative Commons for Netherlands, Naamsvermelding-GelijkDelen 3.0 
Nederland (CC BY-SA 3.0 NL) is a licence permitting any commercial and non-commercial use, as long as credit is given to 
the author for the original creation and new creations are licenced under the identical terms. All new works based on the 
original work will carry the same licence, so any derivatives will also allow commercial use.</skos:definition>
    <skos:exactMatch rdf:resource="http://creativecommons.org/licenses/by-sa/3.0/nl/legalcode/"/>
    <lemon:context rdf:resource="http://publications.europa.eu/resource/authority/use-context/DCAT_AP"/>
    <euvoc:licenceVersion>3.0</euvoc:licenceVersion>
    <cc:requires rdf:resource="http://creativecommons.org/ns#Attribution"/>
    <cc:requires rdf:resource="http://creativecommons.org/ns#ShareAlike"/>
    <euvoc:appliesTo rdf:resource="http://publications.europa.eu/resource/authority/licence-domain/DATA"/>
    <euvoc:appliesTo rdf:resource="http://publications.europa.eu/resource/authority/licence-domain/W_LIT_ART"/>
    <eli-o:responsibility_of_agent rdf:resource="http://publications.europa.eu/resource/authority/corporate-body/CC"/>
    <dct:references rdf:resource="https://creativecommons.org/licenses/by-sa/3.0/nl/deed.en"/>
    <dct:references rdf:resource="https://www.europeandataportal.eu/en/content/show-license?license_id=CC-BY-SA3.0NL"/>
    <foaf:homepage rdf:resource="http://creativecommons.org/licenses/by-sa/3.0/nl/legalcode"/>
</skos:Concept>

Rephrasing my original question, and considering that should refer to a an actual licencing document (as opposed to a legal statement concerning usage rights), would the DCAT-AP community find it useful to point to the Controlled Vocabulary or rather to the specific (dereferenceable) URI of the licence itself?

matthiaspalmer commented 1 year ago

I think many have struggled with finding the correct most correct URI for a public license.

This is the case even for the more well known licenses. For instance, should the reference to the CC_BY_4.0 license be with http or https, with a slash at the end or not, with the deed.sv extension or not (clearly without, but it is easy to get it wrong nevertheless).

Hence, providing a list of license URIs is certainly helpful for both providers of data catalogs and maintainers of data catalogs.

The publication office has typed the licenses as both skos:Concept and http://publications.europa.eu/ontology/euvoc#Licence. I would need to do some digging but I assume that class is a subclass of LicenseDocument.

So, I say go for it as long it is allowed to point to other licenses that are not covered by the listing.

jakubklimek commented 1 year ago

I can speak for Czechia and Slovakia - there is an issue that comes from difference of copyright legislation from western countries such as US and UK. It is highly impractical for us to describe the license using a single IRI from a codelist. In short, our "license" specification consists of 4 IRIs and 2 literals, looks like this:

<https://data.gov.cz/lkod/mdcr/datové-sady/vld/distribuce/csv> a dcat:Distribution ;
    dct:license [ a pu:Specifikace, dct:LicenseDocument ;
                    pu:autorské-dílo <https://creativecommons.org/licenses/by/4.0/> ;
                    pu:autor "Ministerstvo dopravy, Odbor veřejné dopravy"@cs ;
                    pu:databáze-chráněná-zvláštními-právy <https://data.gov.cz/podmínky-užití/není-chráněna-zvláštním-právem-pořizovatele-databáze/> ;
                    pu:databáze-jako-autorské-dílo <https://creativecommons.org/licenses/by/4.0/> ;
                    pu:autor-databáze "Ministerstvo dopravy, Odbor veřejné dopravy"@cs ;
                    pu:osobní-údaje <https://data.gov.cz/podmínky-užití/neobsahuje-osobní-údaje/> ] ;

i.e. there are 4 legal categories, in each we need to indicate an IRI, and each also allows a custom IRI if none of the pre-defined ones are suitable. There are many combinations, and for two of the categories, (copyrighted dataset, copyrighted database) there is an option to state whose name to use in re-use acknowledgements,.

For anyone interested, there is a full google translated guide.

init-dcat-ap-de commented 12 months ago

I was not able to find out what the solution in 3.0 will be.

There is no controlled vocabulary for lincenses listed. (DCAT-AP.de has it own, so we are okay. Even though this is something were a single harmonized list would be nice.)

bertvannuffelen commented 10 months ago

@init-dcat-ap-de the solution provided is an updated section on legal information.. Compared to release 2.x, this section has been updated with the most recent legal activities (including HVD which imposes additional rules), but also it includes the approach which has been discussed with Czech Republic, in which the notion of a licence is more strict than in other EU MS.

In that section, it is expressed that it is recommended to use the EU vocabularies NAL, when possible. But since this property is tightly connected with the legal context in which the dataset is being published, a strict enforcement cannot happen. It would require a EU legal harmonisation first.