SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

dct:rights should be mandatory or at least recommended for Distribution and Data Service #97

Closed jakubklimek closed 2 years ago

jakubklimek commented 4 years ago

Extending #96 , I propose that dct:rights is mandatory, or at least recommended for Distribution and Data Service.

Rationale: It is often the case (at least in Czechia) that the data is not of a kind that can be licensed. Instead, the data is freely accessible (this is different from data that needs to be licensed and is licensed by CC-BY). This is then specified in dct:rights. dct:license should then only be used for data, which is licensable, to indicate the license used. dct:license should therefore be optional.

In fact, in Czechia, the object of dct:rights is a more complex structure, addressing 4 kinds of possible rights to a distribution or data service.

akuckartz commented 4 years ago

No, a license needs to be provided always.

jakubklimek commented 4 years ago

No, a license needs to be provided always.

@akuckartz That is exactly the problem - this is simply not true universally. It is dependent on local legislation, and therefore should not be enforced by technical specifications. In Czechia, the situation really is (based on an analysis made by lawyers specialising on intellectual property rights) as I described it above. dct:license is not applicable to datasets, which, by definition, cannot be licensed. (e.g. created by public administration)

akuckartz commented 4 years ago

Czechia is a memberstate of the EU and the PSI Directive applies. Maybe a simple text can be used such as: "This data is in the Public Domain according to Czech law XY."

jakubklimek commented 4 years ago

@akuckartz Sure, but that is not a license. That is a rights statement like this one (and we need at least 2 more to cover all types of applicable law). A license would apply only if the work was not public domain, contained copyrighted works and we licensed them (and again, we need 3 licenses, one for each type of applicable law) like CC-BY or CC0.

This issue in full detail can be seen in this document sent earlier this year to EDP.

An example of what is necessary to fully specify the terms of use is this specification, which we currently attach to dcat:Distributions using our own predicate, but we could easily use dcterms:rights for this. However, we cannot use dcterms:license for this.

andrea-perego commented 4 years ago

I think we should take into account the risks of making dct:license optional, not only with respect to interoperability.

A possible option to deal with the scenario you describe, @jakubklimek , is to make it clear that a licence MUST be specified if there is one, and if the data / service can be licensed. Only if this does not apply, dct:rights should instead be used (not preventing however people from using dct:rights together with dct:license in the previous case).

Said that, I would be very much interested in knowing some more details on the reasons why given data cannot be licensed according to Czech legislation. From what you say, @jakubklimek , and also reading the document you are pointing to, this is not completely clear to me.

bertvannuffelen commented 4 years ago

I am also interested in the reasoning of the laywers. Because so far to my knowlegde, most countries actually state the opposite: data(sets) are having licenses. The advice is often given that if you encounter a dataset without a license, you should either refrain from using it, or inquire for a license. So this is interesting ...

jakubklimek commented 4 years ago

OK, I think this is the right time to CC our lawyer, @jkb-misek. Since obviously the explanation in this document is not convincing.

akuckartz commented 4 years ago

@jakubklimek That document contains this sentence: "However, in situations, when the provided content is protected by copyright or sui generis rights, data provider must license such rights and he should do it in the most open way possible."

jakubklimek commented 4 years ago

@bertvannuffelen @akuckartz The point is that the most frequent case in Czechia is that the dataset (distribution) is created by public administration and based on that, it does not contain anything protected by copyright, database rights or sui generis rights. Therefore, there is nothing to license, and attaching a license to such a dataset does not make any sense from the legal point of view. In that case, we attach a special rights statement saying just that, for each of the rights.

You can only license data, when the data is protected by copyright or the other rights in the first place. Only then you need to license it so that it can be used safely. And, in Czechia, in that case, you need to license copyright (CC-BY), database rights (CC-BY) and sui generis rights (CC0) separately, which requires three separate links to licenses (or any combination of the above). That is, what the sentence quoted by @akuckartz says.

jakubklimek commented 4 years ago

Maybe this could be clarified if you take a look at our form for filling out the terms of use of distribution secion of our dataset registration form to the Czech National Open Data catalog.

bertvannuffelen commented 4 years ago

Although I am not yet completely out what it would mean, I understand the reasoning as follows.

The compilation of a dataset in a distribution by a public authority is not an act that implies any licenceable rights of that public authority on that distribution.

So what is a licenceable right? Because for me a licence is always a complete human readable, legal, document with the do's and don't's for that resource. It never came up to me that some rights should not be part of a license.

At least in Belgium, to my understanding, there has been no distinction made between licenceable rights and not licenceable rights. As a consequence, the be-better-on-the-safe-side legal reasoning in case of an absence of a licence document is thus an indication that the rights are unknown, and therefore the user and publisher rights are at risk.

In the case of Czechia the absence of a licence document (and any other rights statements), hence, mean: go ahead, you are safe.

Am I heading in the right direction?

akuckartz commented 4 years ago

The problem might be that the formal definition of ’dct:license’ is not helpful for this situation and should be modified a bit.

andrea-perego commented 4 years ago

Thanks for further explaining the point, @jakubklimek .

A couple of questions:

@bertvannuffelen @akuckartz The point is that the most frequent case in Czechia is that the dataset (distribution) is created by public administration and based on that, it does not contain anything protected by copyright, database rights or sui generis rights. Therefore, there is nothing to license, and attaching a license to such a dataset does not make any sense from the legal point of view. In that case, we attach a special rights statement saying just that, for each of the rights.

I wonder whether this would be practically equivalent to CC0 or not.

You can only license data, when the data is protected by copyright or the other rights in the first place. Only then you need to license it so that it can be used safely. And, in Czechia, in that case, you need to license copyright (CC-BY), database rights (CC-BY) and sui generis rights (CC0) separately, which requires three separate links to licenses (or any combination of the above). That is, what the sentence quoted by @akuckartz says.

A usual approach in case different parts of the data have different use conditions is to apply the most restrictive licence (in the example, CC-BY + CC-BY + CC0 = CC-BY).

Is this principle applicable here?

jkb-misek commented 4 years ago

Hello everyone, I would like to clarify three things: 1) the word "licence" has a slightly different meaning in Czech and in English (even though they look and sound very similar). As is written in the document "licence" in the Czech context presents a specific type of contract which allows using/copying/communicating to the public/whatever content which is protected by IP rights. In English, it can have a broader meaning which covers also a general "consent" and in this meaning, a connection with IP rights is not necessary. However. 2) If there is not present any content which is protected by any kind of IP rights (copyright/copyright protection of a database/sui generis database right) in the data set, there is nothing to license (in the strict meaning of the word) and therefore you cannot use any licence including Creative Commons (or any other similar). That is for two reasons. Firstly, if there is nothing protected, and if you would still be trying to glue a CC licence to it, it would be against the rules of that licence. And secondly, it would be very confusing for the end-user, because if the content is not protected by any IP right, he can do whatever he wants with it, licence or no-licence. 3) Data themselves are not protected by any IP. It is true that in Czechia we have quite a broad exception for administration purposes. But generally, the reason why we had chosen to go this way was the fact that data themselves are not protected by any IP rights. Therefore we use this rights statement ("there is no IP, you can do whatever with it" - but this is not a licence in a strict meaning). Furthermore, if there is no IP protection, you cannot just "forbid" to use the data - because the data are not protected in any way (because data is generally not a thing, so there is no property right to it). So even if you write "you cannot do that and that", it is not legally binding, unless it is a contract (and this part -what are conditions of a valid online contract- can be quite different between the member states).
You have to have some other kind of protection - as was mentioned there are three possibilities: i) copyrighted content, ii) copyright protection of a database and iii) sui generis database right. From legal certainty point of view, it is not sufficient to "stick" one general licence to the dataset - what is licenced? What rights are even present? It can be quite confusing for the end-user. And a good communication of what rights are present (if any) and what licences are used (if any), or if it is just "free to use" is quite essential. I hope this clarifies the thing a bit. I am attaching my article on the issue, maybe it can be helpful. Jusletter-ITopen-data-open-api.pdf

jkb-misek commented 4 years ago

Thanks for the questions.

I wonder whether this would be practically equivalent to CC0 or not.

Well, practically in the end yes. But there is a difference in that CC0 is a waiver of rights (an active action with a legal effect), but the "administrative exception" we have here means that there are no rights in the first place (there is nothing to waive), But from the practical point of view, yes it is very similar. BTW, there can be a problem with CC0 in some countries (e.g. Czechia), because some rights (like copyright) cannot be waived.

A usual approach in case different parts of the data have different use conditions is to apply the most restrictive licence (in the example, CC-BY + CC-BY + CC0 = CC-BY).

Is this principle applicable here?

Practically, in the end, probably yes. But as I was mentioning in the previous post, it is not legally clean, because those rights protect different parts of the published dataset/information.

bertvannuffelen commented 4 years ago

@jkb-misek I've read your article and it is me a bit clearer although I think that there is a difference here between the legal reality a nd the effort reality.

(Disclaimer I am not a lawyer, but I try here to understand the impact on the metadata. So my reasoning might have some legal gaps. And maybe I might go to wide.)

The described legal reality starts from the premise of there is content on which IP rights can hold and or not. Together with the general idea that the government embodies all citizens, and therefore we can consider that for any content produced by the government we collectively own the rights; moreover we paid via taxes for the dataset's creation, and thus there is no argument for additional fees. These make that there is no or little argument to explicitly claim any IP rights, as that would confuse the legal reasoning.

That reasoning is fine, except that does not answer the question, what can I ask the government w.r.t. to that dataset? APIs, dumps, update frequency? What is the SLA we as citizen can ask? In the paper the distribution form is explicitly disregarded as an argument for enforcing a license. So The above legal reasoning solely simplifies answering the question "may I reuse the data of that distribution". It does not answer the question for what aspect of the data service publication I better add a license or not. And if all distributions must fall under the same reuse conditions.

Within administrations, the effort reality has been and is, I believe, a heated debate. If we provide Open Access to the data who is paying the bills of the virtual machines? In Flanders we have the case that because of the success of some APIs substantial amounts of the budget of agencies are burned. Having the weird consequence for maintaining the budget in balance the following dilemma emerges: reduce the usage costs by reducing the accessibility (increase income or block access) or reducing investments in projects (i.e. people) (and thus reducing data streams in the future). It is really perverse that a successful API could lead to less Open Data because there is no budget anymore to invest in.

I understand that legally those effort aspects might not be connected to IP rights, but for a re-user point of view they are. So what does then the absence of a license in Czechia means for the SLA? Can a citizen request the data in an API if there is only a dump? And must that then be given? Under the same condition?

I have some practical cases also:

bertvannuffelen commented 4 years ago
1. the word "licence" has a slightly different meaning in Czech and in English (even though they look and sound very similar). As is written in the document "licence" in the Czech context presents a specific type of contract which allows using/copying/communicating to the public/whatever content which is protected by IP rights. In English, it can have a broader meaning which covers also a general "consent" and in this meaning, a connection with IP rights is not necessary.

Am I right that pricing, responsibility, complaints process statements, ... are not part of a license document in Czechia?

bertvannuffelen commented 4 years ago

Another remark has to do with the technical reading of the RDF content.

In RDF the absence of a triple means I do not know which is different from I know it does not apply.

E.g. in the EDP: https://www.europeandataportal.eu/data/datasets/3d0f6d67-13d0-4733-ab6b-0f2d686f318d?locale=en

the message is "No license provided".

In Czechia this should then be "Free reuse permitted".

Or am I wrong here?

jakubklimek commented 4 years ago

@bertvannuffelen I believe I can clarify the practical cases.

how to distinguish between a mistake in the editorial process: one cannot detect anymore that that a publisher has forgotten to add a license. How to resolve that? We do distinguish this. If the statement is missing, then it is an editorial error. In the Czech National Open Data Catalog, we use the following structure to specify the statement:


@prefix terms: <https://data.gov.cz/slovník/podmínky-užití/> .
:distribution terms:specifikace :spec .

:spec a terms:Specifikace ;

copyright: nothing to license

terms:autorské-dílo <https://data.gov.cz/podmínky-užití/neobsahuje-autorská-díla/> ;
#sui generis database right: nothing to license
terms:databáze-chráněná-zvláštními-právy <https://data.gov.cz/podmínky-užití/není-chráněna-zvláštním-právem-pořizovatele-databáze/> ;
#copyright protection of a database: nothing to license
terms:databáze-jako-autorské-dílo <https://data.gov.cz/podmínky-užití/není-autorskoprávně-chráněnou-databází/> ;
#Contains personal data: No
terms:osobní-údaje <https://data.gov.cz/podmínky-užití/neobsahuje-osobní-údaje/> .

See [this real world example](https://data.gov.cz/zdroj/datové-sady/MV/706529437/9c73b802263c5e0ccf5542f10fbc35bb/distribuce/bf613518854347d18e154bfb4501a2a0/podmínky-užití). The 4 objects are dereferencable to HTML documents (which have also versions in English - you switch in the language picker). This corresponds to the [dataset registration form](https://data.gov.cz/formulář/dataset-registration?krok=2).
> In RDF the absence of a triple means I do not know which is different from I know it does not apply.
> the message is "No license provided".
>
> In Czechia this should then be "Free reuse permitted".

The same as above. This would be an editorial error or it would have the above structure attached. Of course EDP now does not understand our structure and therefore all our datasets in EDP "have no license".
bertvannuffelen commented 4 years ago

@jakubklimek why did you not encoded them as ODRL statements?

jakubklimek commented 4 years ago

@bertvannuffelen Good question :) I know it should and can be done, but unfortunately we had other priorities so far.

barthanssens commented 4 years ago

Somewhat similar situation in Belgium, where e.g. laws, decrees and other official acts from the government / parliament are not subject to copyright, and where one technically cannot waive "moral rights" (though CC0 solves this nicely with "to the extent of the law").

I'm not really in favor of making dct:rights mandatory, though I could live with it :-)

IMHO the most important remark came from @bertvannuffelen about using ODRL, because in the end that's what probably matters most (we have quite a few licenses in Belgium that tend to be very CC-ish but organizationally also very difficult to change, so machine-readable licenses would solve this technically)

jkb-misek commented 4 years ago

Hi, thanks for questions and discussion, I finally got some time for a reply (sorry for the delay). There is one more problem to the legal reality you did not mention - that is the validity of a contract. Especially in the following part, this is a crucial element:

That reasoning is fine, except that does not answer the question, what can I ask the government w.r.t. to that dataset? APIs, dumps, update frequency? What is the SLA we as citizen can ask? In the paper the distribution form is explicitly disregarded as an argument for enforcing a license. So The above legal reasoning solely simplifies answering the question "may I reuse the data of that distribution". It does not answer the question for what aspect of the data service publication I better add a license or not. And if all distributions must fall under the same reuse conditions.

As I said before "licence" is a kind of contract, which allows using content protected with IP rights. This is not a question of "licence" in the strict sense of the meaning. That might be a problem, and the situation will differ highly based on specifics of national civil law. For example, in Czechia, it is rather unclear whether you can conclude a contract without knowing the other side (at least by a minimal identification). And most of the legal scholars agree that it is not possible. For example, imagine a contract (similar to Creative Commons, but without the licence part, because there is not IP present), where you give an offer to everyone online and you expect that such contract will be concluded if the other person fulfils conditions you have set, without ever contacting you or meeting you. Basically, that is what we might say with the provided data "You can use the data under the condition that you state your source, you will interpret it correctly etc. We will provide the data 24/7 with 99% stability..." But such a contract cannot be concluded under the Czech law, because the identification of the other side is necessary. There is an exception - IP Rights and licences. We have a specific provision which allows CC to work this way. But it can only be applied when there is a copyright or database right protected content. Therefore there are three possible outcomes when it comes to issues you have mentioned above:

1) There is content protected with IP rights. In such a case, the data provider can construct a licence which includes also conditions you have mentioned. It would be legally binding and so on. 2) Providing the IP-free data after registration of a user. In such a case, it would be (in Czechia) a valid contract, because the registration would serve as an identification of the second party to the contract (e.g. our transport data are provided like that). The question is, whether we can still call it open data (e.g. @jakubklimek thinks it is not). In this case, we can talk about some kind of SLA. 3) Providing the IP-free data without registration of a user. In such a case, any "licence" (rights statement) will not be legally binding, because it will not constitute a contract. The user is not legally bound to anything (not even mentioning the source) and at the same time any statement from the side of the data provider (update frequency, operation time ...) is just a statement without any legal importance.

This might be very different if your national law allows concluding public contracts similar to CC without the necessity of IP rights present.

Within administrations, the effort reality has been and is, I believe, a heated debate. If we provide Open Access to the data who is paying the bills of the virtual machines? In Flanders we have the case that because of the success of some APIs substantial amounts of the budget of agencies are burned. Having the weird consequence for maintaining the budget in balance the following dilemma emerges: reduce the usage costs by reducing the accessibility (increase income or block access) or reducing investments in projects (i.e. people) (and thus reducing data streams in the future). It is really perverse that a successful API could lead to less Open Data because there is no budget anymore to invest in. I understand that legally those effort aspects might not be connected to IP rights, but for a re-user point of view they are. So what does then the absence of a license in Czechia means for the SLA? Can a citizen request the data in an API if there is only a dump? And must that then be given? Under the same condition?

Yes, the question of "how" the data is provided generally relies on public law (e.g. Freedom of Information Act (FoIA), or EU PSI/OD Directive). You can ask for a different format etc., but the PSB can request a payment equal to the necessary

I have some practical cases also:

* what with cross-border interpretation of the EDP? Can in Czechia more dataset distributions freely reused while in other member states the same cannot (for the same metadata)?

This is a good question for which I do not have a proper answer right now. I will think it through.

* is a database from the military a public sector dataset? The PSI directive excludes them, but does the amendment of the IP legislation in Czechia does that too?

This is not a question of IP law, but public law in general. We have such information excluded from FoIA (in Czechia FoIA covers both access and re-use of PSI). Generally, it is considered public sector information, but it is excluded from access (and thus reuse).

Am I right that pricing, responsibility, complaints process statements, ... are not part of a license document in Czechia?

It can be when there is something to licence (IP rights are present). Otherwise, it can be a part of a contract - but to conclude a contract you have to identify the other side first.

bertvannuffelen commented 2 years ago

Independent of the legal consequences, from a specification point of view dct:license is not obligatory, only recommended. As dct:rights is possible to provide (as optional property), it is possible to provide correct legal information for Czechia according the above discussion.

The above reasoning does not yet motivate the need for "recommendation" of dct:rights.

So the proposal is to keep the specification as is for now.

bertvannuffelen commented 2 years ago

During WG 21 Oct 2021, the wg decided to follow the proposed resolution, to not change the specification.