Closed jakubklimek closed 9 months ago
I also noticed this and think it needs to be clarified.
I suggest that the current definition of applicableLegislation for the catalog:
The legislation that mandates the creation or management of the Catalogue.
is changed to:
A legislation that mandates the creation or managment of some of the resources included in this Catalog.
This assumes you need to provide this "HVD marking" even if only a single dataset / dataservice in the catalog falls under the HVD legislation. The usage note could be changed to clarify this further:
For catalogs that contain HVD resources (as is the scope of this annex) it is mandatory to indicate this by pointing to the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj via applicableLegislation. As multiple legislations may apply to the resource the maximum cardinality is not limited.
On the other hand, if a "HVD marking" only should be provided if ALL datasets / dataservices are HVD, then I do not think it makes sense to make applicableLegislation mandatory on the catalog level. (Otherwise the annex won't be useful for catalogs that contain some HVD resources and some non-HVD resources.)
I think here the specification can be more improved.
Consider your current DCAT-AP catalogue.
ms:catalogue a dcat:Catalogue
dcat:dataset ms:dataset1, ms:dataset2
dcat:service ms:service1.
ms:dataset1 a dcat:Dataset
ms:dataset2 a dcat:Dataset
ms:service1 a dcat:DataService
dcat:servesDataset ms:dataset2.
now the HVD impacts ms:dataset2
and ms:service1
. The responsible dataset and dataservice publisher adapt their DCAT-AP specifications resulting in:
ms:catalogue a dcat:Catalogue
dcat:dataset ms:dataset1, ms:dataset2
dcat:service ms:service1.
ms:dataset1 a dcat:Dataset
r5r:applicableLegislation eli:2023/138/oj
ms:dataset2 a dcat:Dataset
ms:service1 a dcat:DataService
r5r:applicableLegislation eli:2023/138/oj
dcat:servesDataset ms:dataset2.
According to the HVD IR an improved metadata have been achieved and thus the objective is reached.
As a next step in the process of HVD: the MS policy officer has to provide an official reporting to the EC. This report is the collection of all metadata that is associated with the datasets that reported by is MS data publishers as HVD.
This can be done by creating an extract from the MS DCAT-AP catalogue by selecting only the entities that are relevant for the HVD reporting: lets call this extract the MS HVD catalogue. It is this catalogue that is represented in the DCAT-AP HVD.
The improvement that could be made is thus to show that the DCAT-AP catalogue and the DCAT-AP HVD catalogue are distinct entities.
now the HVD impacts
ms:dataset2
andms:service1
Did you omit ms:dataset1
(it has applicableLegislation) or did you mean ms:dataset2
because it is served by ms:service1
, even though it does not have applicableLegislation?
Anyway, it seems that applicableLegislation on catalogue is meant to be used with catalogues that contain ONLY (and ALL?) HVD datasets and services, not SOME.
This means that such extract MS HVD catalogues always need to be materialized for reporting purposes (and compliance with DCAT-AP HVD), only then they can be tagged with applicableLegislation pointing to HVD IR, and existing catalogues cannot be directly used for that purpose, even if they contain all the necessary HVD datasets among others.
I would have to actually create:
ms:HVDcatalogue a dcat:Catalogue ;
r5r:applicableLegislation eli:2023/138/oj ;
dcat:dataset ms:dataset1;
dcat:service ms:service1.
ms:service1 a dcat:DataService ;
r5r:applicableLegislation eli:2023/138/oj ;
dcat:servesDataset ms:dataset2.
This seems somehow inconsistent with how other open data (PSI) and protected data (DGA) are being harvested by data.europa.eu (and here, the purpose of harvesting and reporting seems similar to me), where there is an option to either create separate catalogs for PSI and DGA or provide a filtering mechanism (e.g. based on applicableLegislation) to distinguish those, and therefore it is the receiver of the report who would do the filtering.
Similarly, it seems inconsistent with (at least how I interpret it) the rules on HVD scope, Datasets and Distributions, where it seems that the agreement is that an HVD dataset can contain non-HVD distributions, so why an HVD catalog cannot contain non-HVD datasets?
Or do we view this case in a way where only the HVD distributions of HVD datasets are in the scope of DCAT-AP HVD and the non-HVD distributions are not in the scope of DCAT-AP HVD, even though they are distributions of datasets that are in the scope of DCAT-AP HVD? And therefore, in the reporting catalogue, only distributions and datasets in the HVD scope would be extracted, and that is why DCAT-AP HVD does not concern itself with datasets that contain both HVD and non-HVD distributions?
Additionally, If for reporting purposes, I need to create a separate catalogue and tell someone that that one is the reporting one, the presence of the applicableLegislation on it seems a bit ambiguous - it would only confirm in a machine-readable way that it indeed is the reporting one.
On the other hand, if having the applicableLegislation pointing to HVD IR would indicate that the catalog contains (may contain) some HVD datasets (not only/all HVD datasets), it may be viewed as a weak statement. So maybe applicableLegislation can be omitted altogether on the catalogue level?
I would agree dropping applicableLegislation on catalogue level as mandatory to allow for catalogues to contain HVD and non-HVD datasets to be be in the scope of DCAT-AP HVD. It could be used optionally on catalogue level to confirm this catalogue only contains HVD resources. Creating a MS HVD catalogue for reporting purposes and optionally confirming it as such in a machine readable fashion by adding applicableLegislation would then still be possible. I see no benefit in stating a catalogue contains some HVD resources, less so it being mandatory.
I think there is more agreement than it maybe seems.
There are two cases here: a) A general DCAT-AP catalogue can contain both HVD and non-HVD catalogued resources. b) A reporting catalogue is a DCAT-AP catalogue that only contains information that is subject to the HVD reporting.
This yields the following questions:
I personally would prefer we could find a representation where the reporting catalogue is part of the specification. The motivation for this is, that it should be clear that both are distinct yet connected.
That is the reason why I wrote this could be made more clear. Today the current document DCAT-AP HVD represents a reporting catalogue.
So I questioning myself: what would then be the adequate representation for this?
<Reporting Catalogue> owl:subclassOf <DCAT-AP Catalogue>
: A Reporting Catalogue will follow the general property requirements as specified by DCAT-AP.
<DCAT-AP Catalogue> dcat:catalog <Reporting Catalogue>
: A Reporting catalogue is a part of a DCAT-AP catalogue.
With <DCAT-AP Catalogue> dcat:catalog <Reporting Catalogue>
based on this discussion the HVD datasets could not be also in <DCAT-AP Catalogue>
, maybe more appropriate relation (or an additional alternative) would be <DCAT-AP Catalogue> dct:hasPart <Reporting Catalogue>
.
I presume that by owl:subclassof
you actually mean something like r5r:HVDReportingCatalog rdfs:subClassOf dcat:Catalog
? i.e. relation on the class level, not instance level? I quite like this solution, however, if applicableLegislation
would remain mandatory for the instance, the information would be there twice - once as
r5r:HVDReportingCatalog rdfs:subClassOf dcat:Catalog
and then as
<reporting catalog> a r5r:HVDReportingCatalog;
r5r:applicableLegislation eli:2023/138/oj .
right?
And lets not forget the original question of whether
<dcat-ap catalog> a dcat:Catalog;
r5r:applicableLegislation eli:2023/138/oj .
means that this is the reporting catalog, or that some of the datasets are in scope of HVD IR.
Many datasets in public sector has applicable legislations in some sense. To be able to express this, is useful beyond HVDs. I whish for a core DCAT-AP feature for this. By hiding it in the HDV profile we suggest it does not apply for non-HVDs.
Over at BRegDCAT-AP it looks like this:
| cpsv:follows | range=cpsv:Rule | This property links a Dataset to the Rule that defines its legal basis. |
r5r:applicableLegislation
be used on non-HVDs without breaking the scope? Many datasets in public sector has applicable legislations in some sense. To be able to express this, is useful beyond HVDs. I whish for a core DCAT-AP feature for this. By hiding it in the HDV profile we suggest it does not apply for non-HVDs.
Over at BRegDCAT-AP it looks like this:
| cpsv:follows | range=cpsv:Rule | This property links a Dataset to the Rule that defines its legal basis. |
1. Is there a reason why the BRegDCAT-AP approach is not adopted here ?
follows is a "similar" property for a Public Service as domain. That is a whole different settting. See https://semiceu.github.io/CPSV-AP/releases/3.1.0/#Public%20Service%3Afollows .
2. Can `r5r:applicableLegislation` be used on non-HVDs without breaking the scope?
Actually option 2. is not in action. It have been lifted in the draft proposal https://semiceu.github.io/DCAT-AP/releases/3.0.0-draft/#Dataset.applicablelegislation. It has also be discussed in https://github.com/SEMICeu/DCAT-AP/issues/286.
@oystein-asnes If you have any question regarding the lifting of this property I would like that you use the issue #286 or #260. I would like to keep the discussion here focussed on the nature of the catalogues we are considering: Reporting catalogues versus the MS Data catalogue.
My bad @bertvannuffelen . I was not aware that applicableLegislation
is included in DCAT-AP. I rest my case.
I quite like this solution, however, if
applicableLegislation
would remain mandatory for the instance, the information would be there twice<reporting catalog> a r5r:HVDReportingCatalog; r5r:applicableLegislation eli:2023/138/oj .
right?
yes, that is the consequence. But that is the consequence of subclassing where the nature corresponds to the value of a single property. (See our other discussion on entity profiles.) I am as such not to much concerned about it. And I would leave this discussion for that abstract challenge, where this is another example.
And lets not forget the original question of whether
<dcat-ap catalog> a dcat:Catalog; r5r:applicableLegislation eli:2023/138/oj .
means that this is the reporting catalog, or that some of the datasets are in scope of HVD IR.
Given that the second reading is actually covered by a DCAT-AP catalogue, I would consider it as the "reporting catalogue". For me the creation of a reporting catalogue is either an outcome of an additional effort for HVD implementation, either it is actively managed entity within a MS catalogue ecosystem, and in both cases a single unique value is exactly identifying this case. Note that I rely on this interpretation that the datasets and other catalogued resources have a PURI to reference to.
While writing this, I do not think there will be soon a case where a catalogue has multiple values for applicableLegislation. Because it would mean that the catalogued resources in that catalogue must satisfy
While writing this, I do not think there will be soon a case where a catalogue has multiple values for applicableLegislation. Because it would mean that the catalogued resources in that catalogue must satisfy eli:1 and eli:2. Such subsets of catalogues will probably not be reflected in a legal context.
Here we are getting to the core of my original question. Again, you indicate that the applicableLegislation
on an instance of dcat:Catalog
means that all datasets (and resources in general) within that catalog also have the same applicableLegislation
, which would make it somehow redundant. But this is not how I understood the usage of applicableLegislation
on a catalog up to now, not even based on the definition in the DCAT-AP 3.0.0 draft
To illustrate, I used this for the Czech catalog:
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcatap: <http://data.europa.eu/r5r/> .
<https://data.gov.cz/zdroj/katalog/NKOD> a dcat:Catalog ;
dcatap:applicableLegislation
<http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD
<http://data.europa.eu/eli/dir/2019/1024/oj>, # PSI
<http://data.europa.eu/eli/reg/2022/868/oj> ; # DGA
dcat:dataset <datasetHVD>, <datasetDGA> .
<datasetHVD> a dcat:Dataset ;
dcatap:applicableLegislation
<http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD
<http://data.europa.eu/eli/dir/2019/1024/oj> . # PSI
<datasetDGA> a dcat:Dataset ;
dcatap:applicableLegislation <http://data.europa.eu/eli/reg/2022/868/oj> . # DGA
What I mean by that is that this catalog contains some open data (PSI), some High-Value datasets (HVD) and some protected data (DGA), i.e. PSI, DGA and HVD mandate the creation of a data catalog, and it is this one for all three cases. Not that all datasets in that catalog are created because of all the three legislations at once.
<datasetHVD>
is an open dataset and also an HVD one, and <datasetDGA>
is part of the Czech NSIP (protected data).
However, based on the presumption that all applicableLegislation
s on a catalog need to apply to all registered datasets (or resources in general), this would mean that if I use <https://data.gov.cz/zdroj/katalog/NKOD> dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> #HVD
, this is the reporting catalog, and therefore all datasets in that catalog need to be HVDs and therefore also need to have <datasetHVD> <http://data.europa.eu/eli/reg_impl/2023/138/oj> . # HVD
.
Then this seems redundant to me, and also it seems that
1) I can never mix PSI, HVD, DGA and other datasets in a single catalog annotated by applicableLegislation
2) I can never annotate a catalog with applicableLegislation
if it mixes different kinds of datasets, i.e. unless it is a "single-applicable-legislation" catalog
3) It is also different from how I understand HVD Datasets should be denoted in a DCAT-AP catalog, i.e. they are tagged with dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD
, but that should not mean that all its distributions are the ones mandated by HVD IR.
So, shouldn't <DCAT-AP catalog> dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD
rather indicate that there are SOME HVD datasets in the catalog? We could leave the indication of THE reporting catalog to subclassing, i.e.:
r5r:HVDReportingCatalog rdfs:subClassOf dcat:Catalog .
<reporting catalog> a r5r:HVDReportingCatalog .
Which would make the subclassing approach also not redundant with
<reporting catalog> a r5r:HVDReportingCatalog;
r5r:applicableLegislation eli:2023/138/oj .
I could also see the use case in a mixed catalog as described by @jakubklimek
On a national level we already have dcatde:legalBasis which in past similarly allowed to indicate a legal scope of a resource, eg. PSI.
Though to me having the reporting catalogue as a subset via <DCAT-AP Catalogue> dct:hasPart <Reporting Catalogue>
seems preferable. Through <Reporting catalog> r5r:applicableLegislation eli:2023/138/oj
it could be marked which legislation is covered by the catalog, still catalogs could be easily combine and merged on need via dcat:catalog
.
I do not see a use case for explicitly identifying a reporting catalogue via subclassing.
In DCAT-AP-HVD,
applicableLegislation
is mandatory on Catalog, and fixed to the HVD IR. The usage note says:Does this indicate that "this catalog may contain some HVD datasets" for the case of one open data catalog where some datasets fall under HVD? Or does it say that we need a separate HVD catalog that contains only HVD datasets?
I presume it is the first one, also based on the "Scope" part:
but it is not clear to me from the usage note, maybe it could be stated more explicitly.