SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

HVD C7. Adhere to specific data requirements #257

Closed bertvannuffelen closed 6 months ago

bertvannuffelen commented 1 year ago

The HVD regulation requires often that (annex 3.2): the datasets shall be described in a complete and publicly available online documentation describing at least the data structure and semantics.

proposal options

option A: Implicit

option B: Explicit with state of the art information

bertvannuffelen commented 1 year ago

In https://github.com/SEMICeu/DCAT-AP/issues/231#issuecomment-1437116632 there is a 3th option mentioned: namely indicating for each dataset in detail which technical requirement is satisfied (level 3)

jakubklimek commented 1 year ago

I vote for B It stresses the importance of having such documentation (which is otherwise optional and often enough not present at all).

Regarding the "level 3" mentioned in the comment, although I would like having something like that in place, I do not see it as feasible at the moment, and I am quite sure it would just lead to lots of effort spent on defining the technical requirements, supporting them in data portals, and then data publishers just filling what is required regardless of the validity of the statements. So I do not see the added value in it at the moment. At the same time, I think it is something worth pursuing, given that the established criteria are machine verifiable, which may imply creating technical specifications for the individual HVDs and their adoption by the MSs.

matthiaspalmer commented 1 year ago

I vote for option B as well as agree with @jakubklimek that the mechanism using dqv for expressing quality measurments should be investigated. Especially to the possibility to use it by some other third party, not the data provider himself. Pointing to the dataset instead from the dataset would be more sensible.

init-dcat-ap-de commented 1 year ago

Option B is the bare minimum. We prefer option c/level 3. We think that option b has two possible outcomes

  1. Similar to the EDP-DCAT-AP-compliance: 0%
  2. "close enough, I report it as conforming"

https://data.europa.eu/mqa/ grafik

Both outcomes are not desireable. In our opinion we'll need a more nuanced way to describe the conformance of a HVD.

bertvannuffelen commented 1 year ago

@init-dcat-ap-de in order to go forward with option c/level 3 you should align within each domain.

E.g for the geospatial domain, the INSPIRE regulation with its rules applies. Thus, an INSPIRE acceptable way to express option c) should be allowed. Did you check with your INSPIRE colleagues how they would realise this? Could your share their response?

As the HVD does not standardize a conformance document for a dataset in scope, it is a substantial effort to create a machine processable document for each HVD. For me designing this document is beyond the scope of DCAT-AP. In the context of DCAT-AP HVD we can only specify how to retrieve that document. If you consider this crucial for you as a MS for the implementation of the HVD then I suggest you take contact with CNECT. (I already will pass it to CNECT.)

During this DCAT-AP HVD track, we are not creating the MS conformance assessment. This is a MS - CNECT activity. However, we do aim to provide the means so that this assessment can be performed based on a DCAT-AP HVD catalogue.

sirex commented 1 year ago

We use a simplified way to describe content of the data, using a simple table, that can be edited in a Spreadsheet app. But the idea to describe data at the column level, for example if we have a CSV file like this:

id org_name org_addr
42 SEMICeu Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France

Then description table, aligned with Core Business Vocabluary would look like this:

dataset resource model property type ref source level uri
prefix org https://w3.org/ns/org#
adms http://www.w3.org/ns/adms#
ex https://example.com/
datasets/example ex:dataset1
myres csv https://example.com/{}.csv ex:dist1
Organization id data 5 org:Organization
identifier[] ref Identifier id 5 org:identifier
name@en text org_name 5 org:legalName
address ref Address org_addr 3 org:registeredAddress
Identifier value data 5 adms:Identifier
value string id 4
Address address@en data 5 org:Address
address@en text org_addr 5 org:fullAddress

Also we have a tool, that can interpret this structure description table, check if it matches the described resource, can publish any resource data via HTTP REST JSON API and export data into multiple formats, including RDF, as Bulk download.

I guess in our case, something like Option B would work, I just not sure, where dct:conformsTo should point. If it points to a document, preferable machine readable document, like SHACL, then we could pretty easily check if resource matches SHACL.

bertvannuffelen commented 9 months ago

In the second webinar DCAT-AP HVD the option B was chosen. However a concrete harmonised cross-domain expectation was not formulated.

The (short) section Specific data requirements indicates that. But leaves the decision to each domain how to address it.

From the perspective of publishers of DCAT-AP metadata this has no impact on the provided metadata. It is formulated as an incentive to provide as much as possible information that would make the conformance assessment w.r.t. HVD IR possible.