SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

Review range and usage note of dct:hasPart of Catalogues #292

Open H-a-g-L opened 9 months ago

H-a-g-L commented 9 months ago

For the domain dcat:Catalog both properties dct:hasPart ("a related Catalogue that is part of the described Catalogue") and dcat:catalog (sub-property of dct:hasPart - "a catalogue whose contents are of interest in the context of this catalogue") have the range of dcat:Catalog. DCAT 3 revises the definition of dcat:catalog (https://github.com/w3c/dxwg/issues/1156 "A catalog that is listed in the catalog") and introduces the property dcat:resource (sub-property of dct:hasPart) as the parent property of dcat:catalog as well as of dcat:dataset and dcat:service. The new property should only be used when none of the available sub-properties can be.

To more closely align to DCAT 3 and remove the ambiguity, I suggest changing the usage note to more clearly indicate that dcat:catalog should be used to link between "parent" and "child" catalogues. Likewise, the range of dct:hasPart should be changed to dcat:Resource. However, current implementations exist where dct:hasPart is used to link catalogues (cfr. EU Open data portal, JRC data catalogue). In principle, these should not be in violation of the proposed revision because dcat:catalog is a sub-prperty of dct:hasPart and dcat:Catalog sub-class of dcat:Resource.

matthiaspalmer commented 9 months ago

I would add to this that maybe we should not keep dcterms:hasPart on the catalog, instead treat it as being replaced by dcat:catalog. For sure, keeping dcterms:hasPart as well is possible as it is inherited from cataloged resource. But I see no reason for having both in DCAT-AP.

H-a-g-L commented 8 months ago

Upon further reflection and in consideration of https://github.com/w3c/dxwg/issues/1454#issuecomment-1054653629 , dcat:catalog cannot be used to replace the dct:hasPart in the JRC instance because the datasets of the "child" catalogues should be inferred as contained also in the "parent" catalogue. At the same time, IMHO, a review of the range of dct:hasPart would be useful.

matthiaspalmer commented 8 months ago

Ok, thanks for pointing this out, had missed that. So, I guess we need to decide what use-case we are trying to fulfill:

  1. a catalog as a resource inside another catalog (via dcat:catalog)
  2. a sub-catalog, providing a mechanism to create larger catalogs by aggregating other catalogs (via dcterms:hasPart).

I see a need for 2, not 1. For instance an organization that maintains two catalogs: Catalog A is maintained manually Catalog B is created via a transform from another system.

In this case it makes sense to let catalog A point to catalog B via dcterms:hasPart and an harvesting mechanism may be allowed to merge the two.

bertvannuffelen commented 8 months ago

I think indeed a concrete example should be created here to see the distinction. So what is initial sitation, the operation that happens and then the resulting catalogue structure.

@ODP-hil can you create such examples to aid this issue forward?

H-a-g-L commented 8 months ago

Indeed @matthiaspalmer 's use-case 2 is the one we need to accommodate. To give an example:

<dcat:Catalog rdf:about="https://data.jrc.ec.europa.eu/">
    <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">
      https://data.jrc.ec.europa.eu/</dct:identifier>
    <dct:hasPart rdf:resource="http://data.jrc.ec.europa.eu/collection/datam"/>
    <…>
<dcat:Catalog rdf:about="https://data.jrc.ec.europa.eu/collection/datam">
    <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">
      https://data.jrc.ec.europa.eu/collection/datam</dct:identifier>
    <dct:isPartOf rdf:resource="https://data.jrc.ec.europa.eu/"/>
    <dcat:dataset rdf:resource="http://data.europa.eu/89h/1ba64b54-246f-4888-8824-080971c46145"/>
    <dcat:dataset rdf:resource="http://data.europa.eu/89h/5a06cad1-6c12-4d17-b008-4b58956ec3d8"/>
    <…>

When data.europa.eu harvests the JRC catalogue, the datasets of sub-catalogues are attributed directly to the parent catalogue. This is supported by the use of the inverse property dct:isPartOf for the dataset

<dcat:Dataset rdf:about="http://data.europa.eu/89h/1ba64b54-246f-4888-8824-080971c46145">
    <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://data.europa.eu/89h/1ba64b54-246f-4888-8824-080971c46145</dct:identifier>
    <dct:isPartOf rdf:resource="http://data.jrc.ec.europa.eu/" />
bertvannuffelen commented 7 months ago

I think it is the case that dcat:catalog and the usage of dct:hasPart in DCAT-AP for a catalogue coincide.

Can one explain what is the difference between 1 and 2 case? I do not see the need for making a distinction between (sub) catalogues that are in scope of the DCAT-AP aggregation catalogue but not harvestable and harvestable DCAT catalogues by using a different property. That feels uncomfortable: so can someone aid me here to explain the semantical difference between both properties?

For example: the case below uses dct:hasPart for an harvestable Catalogue while dcat:catalog for a non-harvestable.

:c1 a dcat:Catalog;
   dct:hasPart :c2.
   dcat:catalog :c3.

:c2 a dcat:Catalog;
    dct:title "I am a harvestable Catalog".

:c3 a dcat:Catalog;
    dct:title "I cannot be harvested".
bertvannuffelen commented 5 months ago

@matthiaspalmer and @H-a-g-L during the last webinar there was confusion about the semantics or expected behaviour for dcat:catalog versus dct:hasPart (in the context of DCAT-AP).

In DCAT 3: the definition for dcat:catalog is a catalog that is listed in the catalog. In DCAT-AP 3 the definition for dct:hasPart is related Catalogue that is part of the described Catalogue.

So can you clarify what is the difference between listed and part of in your opinion?

matthiaspalmer commented 5 months ago

@bertvannuffelen for me listed is very close to member of in the set theoretical perspective. While part of is more open and could be interpreted in many ways. We have the need to indicate a subset relation, hence has part is the only option viable as it wide enough to include that interpretation.

bertvannuffelen commented 5 months ago

I understand there is a difference between a member and a subset in set theory, but the definition is not clear about which one is intended.

So lets analyse the case and try to understand the case.

I think you will agree that if a Catalogue is considered a set then the members of a Catalogue are the Catalogued Resources.
So the question is whether the referenced catalogues by the property dcat:catalog must be part of the Catalogued Resource/s for that Catalogue or not. W3C DCAT does not make a textual statement about it.

However W3C DCAT states that dcat:catalog is a subproperty of dcat:resource. And dcat:resource is the membership property indicating that a Catalogued Resource is a member of a Catalogue. (We all use dcat:dataset in that meaning.) As a consequence dcat:catalog is thus a specialisation of the membership property. And thus, the referenced catalogues by dcat:catalog must be members of the Catalogue.

Following the above reasoning it means that our use of dct:hasPart as subset declaration: we have aggregated the referenced catalogues into one, is not covered by this property.

proposal

  1. do not change DCAT-AP for now.
  2. point this semantical difference between membership and subset to W3C.