SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

Propose change on inverse properties in DCAT? #313

Closed terjesyl closed 2 weeks ago

terjesyl commented 8 months ago

Summarized

Issue: only using dcat:inSeries and dcat:last+dcat:prev to define series membership does not meet the requirements and needs expressed by different contributors in this repo. But using dcat:seriesMember alone is not compliant with W3C DCAT's definition of inverse properties.

Possible solution: propose a change to W3C's DCAT, to make dcat:inSeries the inverse property of dcat:seriesMember.

Description

We see some issues regarding how to meet the use cases for Dataset Series presented by several people here. Two important considerations mentioned:

Also see:

As far as I can see there are two ways to define a series/collection that are also easily implementable:

Method A gives a linked list. This does not make sense for unordered collections, and there have been several objections to making this the mandatory way of describing a dataset series.

Method B allows both ordered and unordered collections, and seems to meet the requirements in the use cases presented. Thus it seems preferable to require dcat:seriesMember when defining a dataset series, and making inSeries optional. However this raises an issue. DCAT chapter 7 "Use of inverse properties" defines dcat:seriesMember as an inverse of dcat:inSeries, and says:

The properties described in 6. Vocabulary specification do not include inverses intentionally, with the purpose of ensuring interoperability also in systems not making use of OWL reasoning. However, recognizing that inverses are needed for some use cases, DCAT supports them, but with the requirement that they MAY be used only in addition to those described in 6. Vocabulary specification, and that they MUST NOT be used to replace them.

In other words, if an inverse property is used, here dcat:seriesMember, the master property, here dcat:inSeries, MUST exist. If DCAT-AP requires dcat:seriesMember, descriptions of dataset series would need to use both dcat:seriesMember and dcat:inSeries to be compliant with DCAT-AP and DCAT. This defeats the purpose.

If instead dcat:inSeries is defined to be inverse property of dcat:seriesMember this would solve the issue. Is this something we should propose to W3C's DCAT?

terjesyl commented 8 months ago

I might be missing some important aspects here, so please chip in if there are any contradicting opinions on this topic. @bertvannuffelen

bertvannuffelen commented 8 months ago

Hi @terjesyl

thanks for you thorough proposal.

After rereading it several times and the current DCAT specification, it seems that based on this section

The properties described in 6. Vocabulary specification do not include inverses intentionally, with the purpose of ensuring interoperability also in systems not making use of OWL reasoning. However, recognizing that inverses are needed for some use cases, DCAT supports them, but with the requirement that they MAY be used only in addition to those described in 6. Vocabulary specification, and that they MUST NOT be used to replace them.

our profile discussion which direction of the inverse property we like to use cannot be hold. The DCAT specification stipulates that at least one direction which they have chosen should be present. This is understandable from a general harmonisation perspective.

However given our discussions in DCAT-AP I feel that the community would possibly has interest in choosing a different basis.

Taking is aspect into account I think we can resolve this as follows for the impacted properties:

  1. for each property that is demoted by DCAT either
    • add an additional usage note that DCAT enforces the use of the other property, or
    • remove it from the specification
  2. add a generic section on inverse properties.

Unfortunately we cannot simply remove all not supported DCAT properties as the case exists that cardinality constraints are to be expressed: E.g. A dataset in a dataset Series must be connected with a DatasetSeries. In that case the cardinality constraint should be expressed differently.

jakubklimek commented 8 months ago

Let me offer another perspective. Since both DCAT and DCAT-AP are RDF based specifications, and even JSON-LD allows us to use reverse properties, my question is: Regarding

Defining a collection/dataset series on pre-existing datasets without updating all the member datasets (i.e. using dcat:seriesMember and not dcat:inSeries).

Why is it a problem to use dcat:inSeries for this purpose? I do not need to "update pre-existing datasets", I just state

<pre-existing-dataset> dcat:inSeries <series> .

instead of

<series> dcat:seriesMember <pre-existing-dataset> .

in the triples describing the series, and I get the same effect. If talking about JSON-LD representation of this, I can use @reverse see JSON-LD 1.1 to achieve the expected positioning within the JSON tree.

What am I missing?

bertvannuffelen commented 8 months ago

@jakubklimek, @terjesyl raised a profile conformance points: namely that w3c DCAT has chosen the direction properties.

According to that, DCAT-AP as a profile has not the choice use use either one. One must use always dcat:inSeries and optionally may add dcat:seriesMember.

An knowledge graph with only dcat:seriesMember is not conform w3c DCAT.

For me at the level of the semantics in the "data specification" this is fine. However, it means that DCAT-AP cannot promote the use of the other direction when this is considered the most appropriate and easier to implement. Promoting the other case would mean that the RDF exported by a catalogue has to be augmented before sharing with the W3C direction.

jakubklimek commented 8 months ago

@bertvannuffelen I think we are in agreement here. My question to @terjesyl is what is the exact problem in using the DCAT compliant dcat:inSeries that would be solved by using the inverse property, which we cannot encourage.

terjesyl commented 8 months ago

Thanks for the input @jakubklimek. We are still discussing this on our team, so therefore the late reply. From the RDF-perspective, I completely agree with you, whether one uses inSeries or seriesMember is not an issue by itself (except for compliance with DCAT). I think @bertvannuffelen summarizes it to the point. This seems to be a question of what is most appropriate and easiest to implement, and what DCAT-AP wishes to promote for that purpose. I think this implementation concern is worth considering. Using seriesMember as "master" property has also been expressed by others in the community.

My question is if the DCAT-AP community sees it as beneficial to use dcat:seriesMember as the "main" property and dcat:inSeries as its inverse, and if that is a change we wish to propose to DCAT.

(edit: adding minor remarks and fixing typos)

jimjyang commented 7 months ago

[Sorry for the long text, which is "copied" from the sandbox https://jimjyang.github.io/playground/dataset-series/ where I tried to build a dataset series]

Let’s build a dataset series using dcat:DatasetSeries!

Use case:

I. Assuming that we have the following three instances of dcat:Dataset, already published by three different Agents:

II. Assuming further that we have the following file to be harvested into our data portal, created by Agent4 who wants to create and publish an open dataset series by reusing the abovementioned three already published dataset descriptions:

Possible implementations:

After the harvesting, in our data portal, should this instance of dcat:DatasetSeries

A. use dcat:seriesMembers to refer directly to the abovementioned three instances of dcat:Dataset, as in https://jimjyang.github.io/playground/dataset-series/beesEU2022.ttl?

B. or, not use dcat:seriesMembers, as in https://jimjyang.github.io/playground/dataset-series/beesEU2022withoutMembers.ttl?

C. or, use dcat:seriesMembers referring to new instances of dcat:Dataset that refer to the original instances of dcat:Dataset using owl:sameAs, and these new instances of dcat:Dataset also refer back to the dataset series using dcat:inSeries, as in https://jimjyang.github.io/playground/dataset-series/beesEU2022withNewMemberURIs.ttl.

Questions:

  1. In general, is it reasonable to assume or even request that the metadata/description of a dataset has to be updated with the information about where the dataset is reused, every and each time it is reused?

  2. Wouldn’t it (therefore) be better if W3C/DCAT3 had chosen dcat:seriesMember as the main property and dcat:inSeries the inverse, such that alternative A above which is the most straightforward implementation, could become compliant with W3C/DCAT3?

jakubklimek commented 7 months ago

@jimjyang I do not understand one thing. A, B, and C are all different from what Agent 4 uses. Why can't the data portal send exactly the same thing, i.e.

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base  <https://jimjyang.github.io/playgound/dataset-series/> . 

### The dataset series description

<beesEU2022> a dcat:DatasetSeries ; 
   dct:title "Bee population in EU, 2022"@en ;
   dct:description "A dataset series containing existing national open datasets."@en ; 
   rdfs:comment "assuming that members will refer to this dataset series using dcat:inSeries"@en ; 
   .

### Enriching the dataset descriptions by using their original URIs

<https://jimjyang.github.io/playground/dataset-series/beesBE2022> a dcat:Dataset ; 
   dcat:inSeries <beesEU2022> ;
   . 

<https://jimjyang.github.io/playground/dataset-series/beesCZ2022> a dcat:Dataset ;
   dcat:inSeries <beesEU2022> ;   
   . 

<https://jimjyang.github.io/playground/dataset-series/beesNO2022> a dcat:Dataset ;
   dcat:inSeries <beesEU2022> ;
   .

I sense that the problem may be coming from this statement:

Note: Agent4 has no mandate neither to update the abovementioned three instances of dcat:Dataset nor to enforce the other agents to do so. The abovementioned instances of dcat:Dataset will thus remain as they are.

What do you mean exactly by "updating"? I do not see that as an actual problem. I do not see a connection between updating the original datasets and sending e.g.

<beesEU2022> a dcat:DatasetSeries ; 
<https://jimjyang.github.io/playground/dataset-series/beesNO2022> a dcat:Dataset ;
   dcat:inSeries <beesEU2022> .

i.e. I do not see sending an RDF triple with <https://jimjyang.github.io/playground/dataset-series/beesNO2022> as a subject necessarily connected to "updating <https://jimjyang.github.io/playground/dataset-series/beesNO2022>". It is up to the portal implementation to decide, which triples to send when asked for a description of the dataset series. This may include also triples with the dataset series as an object, as in this case.

Additionally, (optionally), the portal can also add the inverse, sending e.g.

<beesEU2022> a dcat:DatasetSeries ; 
    dcat:seriesMember <https://jimjyang.github.io/playground/dataset-series/beesNO2022> .
<https://jimjyang.github.io/playground/dataset-series/beesNO2022> a dcat:Dataset ;
   dcat:inSeries <beesEU2022> ;
jimjyang commented 7 months ago

Sorry, @jakubklimek, that I wasn't (and still ain't) able to explain our concern precise enough. Our main concern is that using dcat:inSeries as the main property may make it more difficult to implement in non-reasoning systems than using dcat:seriesMember as the main property.

Anyway, the main question is actually which property to choose as the main property. So, besides having to be compliant with W3C/DCAT3 (which though still is a Working Draft), could someone summarize the pros and cons of (the rationale behind) choosing dcat:inSeries as the main property (similarly pros and cons of choosing dcat:seriesMember)?

jakubklimek commented 7 months ago

@jimjyang OK, before we start listing pro's and con's (which may actually be a discussion best held in the W3C DCAT group), let me get better clarity on what you mean by non-reasoning systems. What makes the case

<beesEU2022> a dcat:DatasetSeries ; 
<https://jimjyang.github.io/playground/dataset-series/beesNO2022> a dcat:Dataset ;
   dcat:inSeries <beesEU2022> .

more difficult to work with for non-reasoning systems than

<beesEU2022> a dcat:DatasetSeries ; 
    dcat:seriesMember <https://jimjyang.github.io/playground/dataset-series/beesNO2022> .

? If by reasoning we mean understanding e.g. RDFS subclasses, subproperties, domains and ranges - it makes no difference here. Here we just work with different predicate directions.

jimjyang commented 7 months ago

@jakubklimek To be honest, I don't know if there are any systems which may be called "non-reasoning systems". What I mean is not (only) the (in)ability to understand subclasses, subproperties etc., but (also) the (in)ability to understand/infer the inverse property when only the main property is given.

Theoretically and technically I also believe that it works either way, but I hope we choose the way which is "most logical" and "less difficult" to implement. So, when the whole intention behind introducing dcat:DatasetSeries is to catalogize dataset series and thus to know the members in the series, why not getting the members directly from the dataset series description (using dcat:seriesMember)?

[I'm sorry that I'm unable to participate in the webinar the coming Tuesday. Keep the good work and wish you all a successful webinar!]

jakubklimek commented 7 months ago

@jimjyang Yes, those are valid concerns, however, I think these need to be discussed at W3C DCAT since they are the ones who defined the main and the inverse properties in this case like they are. I think their motivation might have been to be able to add datasets to the series "without having to update the series" - which together with your case are two contradictory ones.

bertvannuffelen commented 5 months ago

From the perspective of DCAT-AP we will close this issue. As followup action we will bring the issue to W3C too.