SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
78 stars 24 forks source link

HVD: which additional information is provided, too many properties listed? #312

Closed matthiaspalmer closed 11 months ago

matthiaspalmer commented 1 year ago

In the overview section in HVD it says:

In this document only the additional information, that is required for the catalogued resources which are within scope of the regulation, is included.

There are several places were it is unclear / hard to read which kind of additional information is provided. Providing these properties may lead the reader to think (wrongly) that only the properties mentioned explicitly are allowed. This potential misunderstanding is further encouraged by the sentence "For this entity the following properties are defined" that appears after the "Properties" section on each class. The wording on Agent is much more informative and does not lead the reader wrong: "This specification does not impose any additional requirements to properties for this entity."

The following properties are listed without seeming to provide additional information / guidance (at least in my reading).

On Catalog:

On Dataset:

On Distribution:

On Data service:

I understand that this might be an effect of the document being work in progress, that additional usage notes are expected in all of these places. However, if this is not the case, I would argue to remove all of the properties listed above as they seem to provide no extra value of being mentioned explicitly. This might lead to a shorter, sharper and easier to read document.

bertvannuffelen commented 1 year ago

Lets indeed make the sentence clearer under the class.

So for classes that have properties specified the sentence below

This specification does not impose any additional requirements to properties for this entity."

could be as follows:

This specification does not impose any additional requirements to properties for this entity besides the one listed below.

bertvannuffelen commented 1 year ago

@matthiaspalmer thanks for making this list. I agree with your overall objective.

The reason why some of them are in the list but which are identical to DCAT-AP, is that they are the outcome of the mapping of the HVD IR to DCAT-AP terminology.

For instance, the dcat:endpointURL is the outcome from HVD IR sentences " ... and publish an API.".

I agree that this mapping is maybe lost or scattered around in the additional sections below the formal specification.

In this process we have encountered readers that have no experience with DCAT-AP ecosystem: e.g. those coming from the legal side. So I am wondering how we can keep that information in the specification. Would an additional usage note like "In the HVD IR the need for sharing an endpointURL is mentioned.". be helpful?

matthiaspalmer commented 1 year ago

@bertvannuffelen, ok, I understand the reason now. But I am a not convinced that this is a good enough reason to list the properties, especially not if it can be misunderstood as I outlined above.

To clarify, I think it is fair to say that there are three different reasons for a property to be mentioned in the document today:

  1. A modelling reason, e.g. a new property is introduced or a change in cardinality is expressed.
  2. An guidance reason, e.g. a usage note that provides guidance when using the property in the scope of a HVD.
  3. An alignment with the law reason, e.g. to guide the reader to understand how the HVD IR is translated into properties.

It could perhaps be argued that reason 2 and 3 can be blurred if 3 is formulated as a usage note where the reference to the HVD IR is clarified.

I think having all these three reasons for including properties is problematic: First, the reader is not informed that there are three distinct reasons. Second, there is no indication which of 1-3 are applies for each property (actually, multiple reasons apply most of the time). Third, it is only if the reader carefully compares with DCAT-AP that the reader may figure out if 1 or 2 applies. Fourth, for the alignment with the law, there is no explicit mentioning of which part of the HVD IR each property corresponds to, hence figuring out 3 is even harder.

I would argue that there are three main paths forward:

A. Sacrifice (3) and remove from the document all properties that have no modeling or guidance reason. Also clarify that "additional information" corresponds to both modeling (1) and guidance (2). B. Clarify in the document that there are thee different reasons for a property to be mentioned. Preferrably also indicate per property which of 1-3 that applies (potentially several). C. Muddle the distinction between 2 and 3 by providing usage notes everywhere that as a minimum refers to the HVD IR even if it does not provide any additional guidance in how the property should be used.

Finally, I would argue that the reasons for including properties due to reason 3 is not convincing and leads to strange effects. For instance, consider mediatype and conformsTo, I would argue that mediatype is probably more fundamental than conformsTo (linked schema). Still, only conformsTo (linked schema) is mentioned on the Distribution class while mediatype is not and this just because it cannot be traced back to a certain formulation in the HVD IR? I think it causes more confusion than you gain. Or maybe this is an oversight, maybe there is some mentioning of mediatype / format and the HVD IR, in that case it should probably be added.