SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
76 stars 24 forks source link

HVD C3. License information #253

Closed bertvannuffelen closed 9 months ago

bertvannuffelen commented 1 year ago

The HVD imposes quality licencing information, and in particular using an permissive open licence such as CC-BY 4.0.

proposal

For HVD the licence information shall be given by the property dct:licence with a URI value (persistent link). The URI should be dereferenceable: and thus provide machine readable (provide RDF representation) and a Human readable text.

To indicate relationship with CC-BY 4.0

jakubklimek commented 1 year ago

tl;dr: Here, I must (traditionally) object to restricting values of a license to a controlled vocabulary, unless it somehow addresses the case in the Czechia (and Slovakia is similar) where it is simply not enough to link to one license, as there are multiple aspects to the legal side of a distribution of an open dataset that need to be addressed separately, and there is a difference between a distribution that is freely available and one that is actually protected by copyright, but licensed using CC-BY 4.0 and similar. Our terms of use specification in RDF is a structured thing that looks like this,

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix tos:    <https://data.gov.cz/slovník/podmínky-užití/> .

@prefix distro: <https://data.gov.cz/zdroj/datové-sady/00006947/2de2b4d44488c64f972f6fb0b8805122/distribuce/3429fbfa4dc95ab280c26dfd25d7cf58>
@prefix terms: <https://data.gov.cz/zdroj/datové-sady/00006947/2de2b4d44488c64f972f6fb0b8805122/distribuce/3429fbfa4dc95ab280c26dfd25d7cf58/podmínky-užití>

distro: a dcat:Distribution ;
    dcterms:license terms: .

terms: a tos:Specifikace ;
    tos:autorské-dílo <https://data.gov.cz/podmínky-užití/neobsahuje-autorská-díla/> ;
    tos:databáze-chráněná-zvláštními-právy <https://data.gov.cz/podmínky-užití/není-chráněna-zvláštním-právem-pořizovatele-databáze/> ;
    tos:databáze-jako-autorské-dílo <https://data.gov.cz/podmínky-užití/není-autorskoprávně-chráněnou-databází/> ;
    tos:osobní-údaje <https://data.gov.cz/podmínky-užití/obsahuje-osobní-údaje/> .

or

distro: a dcat:Distribution ;
    dcterms:license terms: .

terms: a tos:Specifikace ;
    tos:autor "Český úřad zeměměřický a katastrální"@cs , "Czech Office for Surveying, Mapping and Cadastre"@en ;
    tos:autor-databáze  "Český úřad zeměměřický a katastrální"@cs , "Czech Office for Surveying, Mapping and Cadastre"@en ;
    tos:autorské-dílo   <https://creativecommons.org/licenses/by/4.0/> ;
    tos:databáze-chráněná-zvláštními-právy  <https://www.cuzk.cz/Predpisy/Podminky-poskytovani-prostor-dat-a-sitovych-sluzeb/Podminky-poskytovani-prostorovych-dat-CUZK.aspx> ;
    tos:databáze-jako-autorské-dílo <https://creativecommons.org/licenses/by/4.0/> ;
    tos:osobní-údaje <https://data.gov.cz/podmínky-užití/neobsahuje-osobní-údaje/> .

and basically addresses 4 categories of "terms of use", each one separately, and this combination can be unique for each distribution. Full explanation is available in the soon to be published deliverable of the STIRData project: STIRData-Legal.pdf.

And, legally speaking, when using data from Czechia, no matter where are you from, you need to understand the terms of use in order to be able to use the data correctly. And this is, unfortunately, governed by the national legislation, and cannot be always "technically simplified" into one CC link.

Sorry for the longer text, but I need to include the description of our situation written by our open data specialist lawyer, Jakub Míšek, and his colleagues. Instructions for licensing for seamless provision and re-use of open data Open data publication can lead to possible violations of copyright and database rights that may protect the content. A possible obstacle is also a situation where personal data are part of the data set. Before publishing, it is necessary to deal with these obstacles by providing licenses where this is necessary and possible. In the case that provided open data or any of its parts are not protected in any way, it is appropriate to provide this information in order to increase the legal certainty of future data users. In Czechia, we strongly differentiate “terms of use” and a “licence”. “Terms of use” refer to a general information that the dataset can be reused, or that there are present some legal obstacles, which is given to the data recipient in a form of metadata record (usually via a hyperlink). For example, terms of use present a good way, how the data provider can inform data recipients about presence of personal data in the dataset. Terms of use are also generally not considered as a contract. Therefore, they can serve only information function and they are not legally binding. On contrary “licence” generally refers to a specific type of contract which is used in the field of intellectual property rights for allowing further use of copyrighted work or other protected content. It is not possible to “license” content, which is not protected by any intellectual property rights. Therefore, we cannot use licence of any sort (e.g. Creative Commons licences) in a situation, when provided open data are not protected by copyright or sui generis database rights. However, in situations, when the provided content is protected by copyright or sui generis rights, data provider must license such rights and he should do it in the most open way possible. In such cases the licence is a legally binding part of the “terms of use” of the data set. On a side note it should be stressed out that licensing might be limited e.g. by the fact, that the provider is not entitled to provide a specific licence in a specific range.

Following diagram shows, what should be considered when the dataset is being published. licencování_OD-EN

bertvannuffelen commented 1 year ago

@jakubklimek I think we should separate here some concerns.

The proposal consists of several steps:

  1. use current DCAT-AP rules and guidelines on expressing legal information
  2. licence information should be shared with a derefenceable URI
  3. licence information should be related with a commonly known licence such as CC-BY 4.0

An assessment:

  1. The Czech case is acceptable according to DCAT-AP
  2. The HVD regulation uses the words "licence" and "terms of use". 2.a. Proposal is that all statements in HVD regulated related to "licence" and "terms of use" are captured by the guidelines of of expressing legal information.
    2.b. The interpretation whether or not the word "licence" in the regulation is a licence according to Czech legation is a topic for discussion between CNECT and CZ. If the outcome the is that for dataset X in defined in Annex a.c.d the licence is in CZ a right, then this can be fine. But does then I may assume the derefenceable URI for the right is created?
  3. The goal is to assist the assement whether or not a CZ legal statement is related with the same permissions as expressed with CC-BY.4.0 Thus the goal is not to restrict the to only these licenses, but to be able to compare the licences. As such mapping requires local expertise, it is the local expert that has to provide it. And thus mapping of a harmonised list like the one of PO is the easiest.

About separation of concerns:

For me it is really fine to provide a dct:rights statement which would be linked to a permissive clause that also occurs in CC-BY 4.0.

jakubklimek commented 1 year ago

@bertvannuffelen I agree with your assessment, I actually misread the proposal and thought that it is proposing limiting the values of dcterms:license to a codelist, which is not the case, as it is to restrict the values of rdfs:seeAlso triples relating the license to well-known licenses.

Still, in the Czech case, a generic rdfs:seeAlso/owl:sameAs relationship is not specific enough, as we use different properties covering different aspects of the Czech copyright law. So a simple seeAlso relation to, e.g., CC-BY 4.0, is simply insufficient and confusing, as it is unclear, to which copyright category it should relate (we have 3 + info about the dataset containing personal information, each needs to be addressed separately, and each can be addressed e.g. by using a different CC variant.

Practically, it would mean that we would not be able to provide such mapping for any of the Czech datasets, except those that are completely out of scope of the copyright law. Those could be viewed as similar to CC0 from the point of view of effects on the data consumer, even though not semantically equivalent.

And I am a bit worried that e.g. data.europa.eu could build a quality metric based on whether or not a dataset license is explicitly related to a CC license or not, even though for some Czech datasets, the simple mapping simply cannot be done for legal reasons.

bertvannuffelen commented 1 year ago

@jakubklimek, here we bump into the limits of DCAT-AP and legislative compliance.

With the proposal we try to get as far as possible with tools provided in DCAT-AP in supporting the legislative information requirements of the HVD.

But that does not mean that there are no other alternative ways to get to that case.
I suggest you align with CNECT and the Czech responsibles for HVD how this information is in line with the HVD.

The HVD directive uses in the Annex the formulation: "under the conditions of the Creative Commons BY 4.0 licence or any equivalent or less restrictive open licence;" This statement does not exclude the Czech case, but the Czech case is harder to validate, as it would require to assess each individual right.

About your concern whether data.europa.eu could install a quality metric which might give the wrong impression (of non-compliance). As such, so far in my understanding, DCAT-AP HVD will not be able to provide a statement of compliance. It provides only a common agreed way of reading DCAT-AP in the context of HVD, so that end-users of data.europa.eu can read the metadata provided by Italy and Poland in the context of the HVD in the same way. Some aspects like a less restrictive open licence are beyond the capabilities of DCAT-AP. So far I have not heard of any quality metric to be installed.

jakubklimek commented 1 year ago

@bertvannuffelen I see in the DCAT-AP HVD 2.2.0 diagram and the description that owl:sameAs on the dct:LicenseDocument is mandatory, if the license document is not from the NAL.

Given the discussion above, I have 2 problems.

  1. owl:sameAs seems too strong for this occasion - I would rather see something like a type property
    1. if the license is not from the NAL, this would basically say that it is the same license as in the NAL, only with a different IRI, in the sense of data entity equivalence, not just "similar or less restrictive" in the legal sense.
    2. Based on the above discussion and the very real differences in the MS legal systems, I would not say that a Czech license is the same as the NAL, which then would be the same as, e.g., a Dutch license. This seems like a technical shortcut hiding the underlying legal differences and difficulties, which are still there and cannot be solved by the technical shortcut
  2. The usage note says that it is in fact 1..1 multiplicity only in some cases. That is in my opinion another reason of making it only recommended, i.e. 0..1, not optional, to avoid confusion.
bertvannuffelen commented 12 months ago

In the Candidate Release of DCAT-AP HVD the encoding of the support for Legal experts to assess the permissiveness compared to CC-By 4.0 has been made more open-ended. The above remarks on the owl:sameAs and additional descriptive information is present as a recommendation in the section on legal Information.