SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

cardinality of accessURL and downloadURL #262

Closed idevisser closed 2 weeks ago

idevisser commented 1 year ago

In DCAT-AP 2.1.1 accessURL is mandatory and has the cardinality of 1..n, downloadURL is optional and has the cardinality of 0..n.

In practice, accessURL does not always occur with a distribution, in that case only a download URL is available. Conversely, the downloadURL is not always present either.

What I now see is that as a workaround it is indicated in the various national profiles that in the case there is no accessURL, the downloadURL must also be included in the accessURL, because this property is mandatory. As a result, there will be URLs in the property accessURL that are not accessURL.

I would suggest to include a constraint that one of accessURL or downloadURL is mandatory. The cardinality for both is then 0..n

bertvannuffelen commented 1 year ago

Hi @idevisser,

One might be confused here, but the argumentation can be found in W3C DCAT https://w3c.github.io/dxwg/dcat/#Property:distribution_access_url.

The fundamental issue is that one cannot determine based on the character string "http://data.portal.org/id/file/213213.ttl" which of the following behaviours apply: a) a webpage for humans to read b) a link to file to download c) a URI d) ...

DCAT solves this by introducing 2 properties accessURL and downloadURL where downloadURL is a subset of accessURL. accessURL is can be case a) or b). downloadURL only b). Creating an automated data processing chain can thus only be created for downloadURLs. While for accessURL it will be a human that must actively click on the URL in the webbrowser and perform the necessary actions to get to the data.

That is the reason why accessURL is mandatory and not downloadURL.

As explained in the usage note of W3C DCAT, if the provided URL is a downloadURL, it must be also provided as the accessURL.

In essence this is a work-around on the fact that HTTP & the use of links in HTML does not provide a mean to indicate the processing objective of the URL notation.

idevisser commented 1 year ago

Hi @bertvannuffelen,

It's good to distinguish between the different behaviours introducing different properties. . But then it's relevant to provide an URL that behave's as you may expect. In my opinion is providing a downloadURL as accessURL is not exactly what users expect. And providing a landingpage as accessURL also not. (The usage note you pointed to is about the distribution(s) accessible only through a landing page.)

Perhaps it's only an example you give, but an accessURL can be also be machine readable, for instance a capabilities file of an WFS.

To be honest, i can't find in the specs that downloadURL is a subset of accessURL. But if that is the case it still solves not the problem.

It's not possible to make only access or download or landingpage mandatory. They are not available in all cases. By making one of them mandatory it introduces URLS that behaves not as may expected, because someone had to provide on that point something.

If there is a need to require a URL to a landingpage, access or dowload URL, the best what can be required is that at least one of this three should be provided. It depends on the data infrastructure which one.

bertvannuffelen commented 8 months ago

@idevisser

To be honest, i can't find in the specs that downloadURL is a subset of accessURL. But if that is the case it still solves not the problem.

It is not precisely a subset: it is the case that one still could provide a human readible access page instead of the downloadURL. But this case is not imposed on publishers. The objective is to ensure a user can have access to the data, direct or indirect. If direct is possible without a technical hurdle (e.g. credentials) then it is a download url, but then the need for an access url explaining how to get access is of not much use. That is why the same value is requested to copy into accessURL.

A Capabilities file for a WFS is close but not necessary the same. It describes the operations that are possible on a service URL, while here we are talking about a specific combination of parameters of that service. That unique combination corresponds to the complete data that this Distribution represents.

Remember that a DCAT-AP Distribution is the complete collection of data records in a format of a dataset. The capabilities of a WFS service document a Data Service which provides access to that Distribution. A distribution is thus not an API.