SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

Scope of DCAT-AP #123

Closed heidivanparys closed 2 years ago

heidivanparys commented 4 years ago

Given the discussion in w3c/dxwg#1221 , it would be good if extra information on the scope of DCAT-AP could be added to the specification.

I guess that when the Commission refers to datasets, the meaning of dataset is supposed to be rather narrow, so that DCAT-AP is not intended to be used for software catalogs, or solutions catalogs such as Joinup.

bertvannuffelen commented 4 years ago

@heidivanparys indeed this is worth a discussion.

One of the aspects is the relationship with the Open Data portals. Is DCAT-AP a specification for datasets on Open Data portals? Or is it a specification for any digital object in a catalog operated by a government agency.

For instance the discussion https://github.com/w3c/dxwg/issues/1221 mentions Joinup. Joinup is using ADMS-AP which maps onto DCAT. So we technically can merge the Joinup catalog with the EDP to one DCAT catalogue. A user would not see the difference between an entry from Joinup or from e.g. the Dutch Open Data portal. This is aggregated catalog one that is being targetted by DCAT-AP? Or not?

A third initiative to be taken into consideration is https://joinup.ec.europa.eu/solution/abr-specification-registry-registries/document/draft-specification-bregdcat-ap-v102 Again here is that a further specialisation of DCAT-AP, or it is a separate specification based on DCAT. So a new member of the DCAT family with it own goals and purposes?

Finally, but maybe something for an implementation guideline is the level of granularity of a dataset. If a DCAT-AP catalogue is created as the aggregation of other DCAT-AP catalogues the user interface to the aggregated catalogue will have a good user experience if the datasets are of the same granualarity. However if the granularity is widely different, statistics nor UX will be meaningfull. E.g. Eurostat could deliver their data as 1 dataset (a hugh cube with 1000's of dimension), or as 10000 datasets (according to the legal names in the legislation) or as 100000's of datasets (a slice per year and memberstate). For a harvesting catalogue like the EDP, if one Memberstate applies option a) another b) and another c) then UX will be difficult, but also the reporting (how many datasets) is very vague.

@ our DCAT-AP community: what are the key elements to which define the DCAT-AP application context?

jakubklimek commented 4 years ago

@bertvannuffelen regarding dataset granularity, based on our experience from Czechia (where we indeed have different dataset granularity levels in our open data portal) I can confirm that this is something that hurts both UX and statistics. The statistics to the level of making them totally irrelevant (number of datasets).

However, I do not think that this is for DCAT-AP to solve, as each publisher and each consumer may simply have different use cases - some need granular datasets (e.g. addresses in one town) and some need the complete picture (addresses in the whole Czechia - currently 6500+ datasets). We currently aim to solve this partially by using dcterms:isPartOf (as, unfortunately, a proper dataset series solution was dropped from DCAT2 and we cannot wait for DCAT3 anymore). Therefore, while we will still have a high number of granular datasets, there will be also the "whole picture" dataset, grouping the granular ones, making the data downloadable by a simple script.

This is just to illustrate that there are use cases for various granularities of datasets, and therefore it seems out of scope for DCAT-AP to create any constraints in this manner. On the contrary, it is an indication of need for dataset series.

bertvannuffelen commented 2 years ago

During WG 15 sept 2021 and WG 21 Oct 2021, the wg was presented an approach to address the questions related to this issue. The result is an updated UML diagram and an usage guideline for Datasets, Distributions and Data Services.

The discussion did not address the high level discussion about the business scope for DCAT-AP. Clarifying this will be part of the planned future work.