SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
76 stars 24 forks source link

How to model dct:temporal for continously evolving Datasets? #201

Open init-dcat-ap-de opened 3 years ago

init-dcat-ap-de commented 3 years ago

In https://github.com/GovDataOfficial/DCAT-AP.de/issues/17 we are discussing a real usecase where I am surprised to find no obvous answer. Maybe I am missing something.

There is a Dataset which is updated constantly (dcterms:accrualPeriodicity) with a resolution of one hour (dcat:temporalResolution). But you can only get the data of the last 10 days. (Something that's probably pretty common for sensor data.)

How would you model this? Neither xsd:date nor dcterms:PeriodOfTime allows this. We would need a xsd:duration:

_:ds  a dcat:Dataset ;
  dcterms:accrualPeriodicity <http://publications.europa.eu/resource/authority/frequency/UPDATE_CONT> ;
  dcat:temporalResolution "PT1H"^^xsd:duration ;
  dcterms:temporal "P10D"^^xsd:duration .

But that would not be allowed. (And it would only be implicit, that you get the last 10 days.)

jakubklimek commented 3 years ago

First of all, the accrualPeriodicity should use the Frequency EU vocabulary, i.e. http://publications.europa.eu/resource/authority/frequency/UPDATE_CONT, right?

As to modelling "last 10 days", you could have also continuously updated metadata, and change startDate and endDate. Otherwise, I think there is no solution to this currently using DCAT(-AP).

init-dcat-ap-de commented 3 years ago

Hm, updating the meta data every hour, just because time moved an hour, is not our desired solution...

Doesn't anyone else sees this usecase. Maybe we need standard solution?

jze commented 3 years ago

Continuously updating metadata will not work. Even if I update the the values in my local open data portal the national portal will take a (daily) snapshot and soon after the temporal information will be incorrect. The error will be even bigger until the European data portal has taken over the data. Therefore, we need a solution to specify these continuously changing datasets.

Here is a real world example: A weather observation of the Deutscher Wetterdienst always covers the last 24 hours: https://opendata.dwd.de/weather/weather_reports/poi/10015-BEOB.csv

bertvannuffelen commented 3 years ago

This is indeed not possible to express as such in DCAT(-AP). And as @jze explains there is no guarantee that the metadata you find at the harvested dataportal is the most accurate one.

Both are connected issues but also distinct. If you connect them as @jze, then a loosely coupled distributed cross-organisational system shall/cannot work. For this distribution scheme, temporal delays and information skew are part of the game. However one can compensate this by using proper PURI handling: namely a visitor of the EDP might find the German Weather reports dataset and considers them to use. In that case the visitor has to go to the source of the metadata to find all information natively. Through which the most recent info can be found e.g. that this dataset is now obsolete, and replaced with a JSON REST API.

This example is to illustrate that we should keep the objectives of the Open Data Portals clear. If you want to have machines connected to your endpoints though your catalogue then very precise and up-to-date meta data is required. However, that is not the objective for most Open Data Portals. They are a human browseable interface to (governmental) data. So if your local catalogue is intended to be a part in a machine to machine data processing system, my advice is to ensure that all data for that purpose is there. But do not expect that you can replace your catalogue with the EDP one, just by changing the domain name of the catalogue.

One can also look to this topic from the human consumer perspective: knowing it is continuously updating data is probably a criterion I am going to use when looking for appropriate data sources. But knowing I only get a window of 10 days is probably less important at first. I would consider that a technical implementation restriction. From that perspective it is less problematic that this information is not machine processable available, but described in some textual notes. I have encountered datastreams with windows of 1 day, 1hour, 10 years. Independent of my intended usage, the question remains then how to express this window.

Expressing a window could be done via temporal coverage (https://www.w3.org/TR/vocab-dcat-2/#Property:dataset_temporal). But the window expression is hard to construct. I have no direct answer for that. Probably we could define based on https://www.w3.org/TR/owl-time/#link-interval-meets, the notion of a coverage window

window(Period) = only data in the interval [ NOW-Period , NOW ] 

But I did not find yet the notion of NOW.

jze commented 3 years ago

It is a pity that this problem was marked as wont-fix. In practice it is very relevant. Now there is no way to express these records DCAT-AP compliant.

Especially when forwarding to other portals, it is important not to have to specify fixed times. Without "floating" time data, we will often have incorrect time metadata.

bertvannuffelen commented 3 years ago

@jze, I tagged it as won't fix because there will be no resolution in the near future in DCAT-AP. If you believe this should be future work, I will tag it as that.

bertvannuffelen commented 3 years ago

On your sentence:

Now there is no way to express these records DCAT-AP compliant.

I think you want to say that "I have no formal way to express that only the data of the last days is available".

Note that a landingspage in which you explain to the potential reuser this situation, is always possible.

Especially when forwarding to other portals, it is important not to have to specify fixed times. Without "floating" time data, we will often have incorrect time metadata.

As mentioned in my previous answer, you could explore temporal expressions e.g. build from OWLTime. But no guarantee this allows to express this.

To a certain level, your usecase is similar as legal information. Before the existence of ODRL, there was no other way to express legal information as in a document. In your usecase there must be a formal language (suggestion OWLTime) that is able to express the situation and then it is easy to adopt it in DCAT-AP.