dcat:DataSeries: Controlled Vocabulary For dcterms:type

init-dcat-ap-de commented 1 year ago

As discussed in the last webinar and here (https://github.com/SEMICeu/DCAT-AP/issues/240#issuecomment-1327166342) this issue is to discuss the needed entries for a controlled vocabulary that could be used with dct:type, similar to dataset types.

To me are currently the following values known:

spatial
temporal
topic

jakubklimek commented 1 year ago

I suggest that we accompany each type with an example, to avoid cases where each reader interprets e.g. "temporal series" differently, and to make sure that everyone sees their series in the list. We could take this opportunity to specify the series types a bit further. E.g.

Temporal series with no overlap comprises datasets: Budget 2020, Budget 2021, Budgets 2022 where e.g. no part of Budget 2020 is part of Budget 2021

Temporal series with overlap - this can be also viewed as versions series, or snapshots - where we have: Blacklist of companies not paying their taxes in March, April, May - some companies may be on the list in various months, some not, and comparison between versions may be meaningful, e.g. to say for how long a company was on the blacklist

Then there is the question of what happens if we combine spatial and temporal series, e.g., budgets of cities in time. Is there a preference, recommendation or is it up to the publishers?

Flat spatial-temporal series with datasets with various spatial and temporal coverages (this could be an additional type)
Spatial series of individual temporal series
Temporal series of individual spatial series

Regarding the topic series, my examples would be, for the "budget" topic:

"Budget of a company" dataset (distributed as XML and CSV, i.e. 2 distributions)
N datasets with "codelists used in budget of a company", distributed as SKOS concept schemes in RDF
"Budget and codelists" dataset distributed as a data service, e.g. SPARQL endpoint

Another example could be the topic "financial datasets", grouping in a series:

Budget dataset
Spending dataset
Budget outlook dataset

I am stating the examples rather to spark discussion (possibly identify distinct issues), rather than saying I am convinced they are the correct usage of DatasetSeries.

H-a-g-L commented 1 year ago

I would suggest adding the values to the dataset-type List.

IMHO, the sub-division of the series type may be excessive and would therefore suggest the following solution:

"generic" dataset-series concept - A class used to group together datasets that are published separately but share some common characteristics. Datasets in a series may differ from other members of the series in their specificity, for example in their temporal, spatial or thematic coverage. Examples include temporal dataset series (e.g. budget datasets of the same country for different years), spatial dataset-series (e.g. budget datasets of the same year but of distinct countries) or thematic dataset series (e.g. datasets related to monitoring of air pollutants all covering the same geospatial and temporal scope but each monitoring a different pollutant).

The specific type can be added as a separate triple. For example:

<datasetSeries/cropMapping> a dcat:DatasetSeries ;
    dct:title "Crop Mapping Service" ;
    dct:type <http://publications.europa.eu/resource/authority/dataset-type/DATASET_SERIES> ;
    dct:type <http://publications.europa.eu/resource/authority/dataset-type/GEOSPATIAL>.

<dataset/cropMappingBrasil> a dcat:Dataset
    dcat:inSeries <datasetseries/cropMapping>

If, however, you feel that the sub-types of the series necessary, I could ask OP to add the following Concepts:

Temporal dataset-series - A series of datasets that describe similar content of distinct temporal coverage. An example of a temporal dataset series is a budget dataset series which groups together budget datasets for 2020, 2021, 2022 etc. all referring to the same geographical area
Spatial dataset-series - A series of datasets describing similar content of distinct geospatial coverage. An example of a spatial dataset series is a satellite mapping service showing crop mapping data of a similar time frame and resolution but of different geospatial coverage
Thematic dataset-series - A series of datasets describing similar content in terms of their general topic and of temporal and spatial resolution, but differ in their specific thematic coverage. An example of a thematic dataset series is a series of datasets collecting data on air pollutants having the same spatial and temporal coverage but each dataset monitors levels of a different specific pollutant.

(The examples are derived from https://github.com/w3c/dxwg/wiki/Examples-on-dataset-series)

Personally I find this option less appealing because it is very restrictive. For instance, would a new concept be required to describe a series that differs in both spatial and temporal terms (budget for several years of various geographical areas)?

init-dcat-ap-de commented 1 year ago

I think, the use of the dcat:DatasetSeries class already says, that something is a generic dataset series.

The use of a "type" should probably only be recommended (or we need a catch-all type, maybe every series is at least thematic?). Data portals could use the type to offer a specialized visualization of the dataset series.

H-a-g-L commented 1 year ago

Thanks for your comment @init-dcat-ap-de. Indeed the rdfs:type already denotes the "general" classification and a "general" dct:type would be redundant. However, I agree that providing an explanation of why certain datasets are grouped in a series is useful.

In my understanding we should then consider:

If this could be done by adding Concepts to the dataset-type List (@MPaunescu), and
which types of dataset-series to include (for me spatial, temporal and thematic should cover most cases).

I am pasting a more relaxed definition of the sub-types, to allow inclusion of less structured series:

Temporal dataset-series - A series of datasets that describe similar content of distinct temporal coverage. An example of a temporal dataset series is a budget dataset series which groups together budget datasets for the years 2020, 2021, 2022 etc. ~~all referring to the same geographical area~~
Spatial dataset-series - A series of datasets describing similar content of distinct geospatial coverage. An example of a spatial dataset series is a satellite mapping service showing crop mapping data ~~of a similar time frame and resolution but~~ of various geospatial coverage
Thematic dataset-series - A series of datasets containing information on the same subject ~~and of temporal and spatial resolution~~, where each dataset focuses on distinct specific themes. An example of a thematic dataset series is a series of datasets collecting data on air pollutants ~~having the same spatial and temporal coverage but~~ where each dataset monitors levels of a different specific pollutant.

init-dcat-ap-de commented 1 year ago

Since DatasetSeries are a subtype of Datasets, should their type just be added to the dataset-type-vocabulary? Or should there be an additional vocabulary?

bertvannuffelen commented 1 year ago

@init-dcat-ap-de these subclassing is a tricky thing. It does not mean that a current DCAT-AP dataset is a superclass of a DCAT-AP dataset Series. It still could be a distinct group.

Note: This is currently the practice because a DCAT-AP catalogue is not a DCAT-AP dataset, because the semantics of the properties are different. Personally I would not have introduced the subclassing of dataset to catalogue and datasetseries in DCAT.

jakubklimek commented 1 year ago

Since DatasetSeries are a subtype of Datasets, should their type just be added to the dataset-type-vocabulary? Or should there be an additional vocabulary?

@init-dcat-ap-de I prefer a separate vocabulary to avoid (or at least discourage) the usage of the items such as "temporal series" with a dataset with multiple distributions as in https://github.com/SEMICeu/DCAT-AP/issues/240#issuecomment-1413923165

bertvannuffelen commented 1 year ago

On the different kinds of dataset-series: Can we provide concrete examples of each case? I know examples of the temporal case, but for the others I have no good real-life example yet.

jakubklimek commented 1 year ago

@bertvannuffelen

Spatial: https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatové-sady%2F00025712%2F59ddccd404a617f695e3a1ef8e65b81e
Topic/grouping: https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatové-sady%2F00006947%2F4e7b6c4003ae1079dda2b7a56c93d248 (sorry for the metadata in Czech)

bertvannuffelen commented 1 year ago

@jakubklimek thanks: it is good to have concrete examples.

jakubklimek commented 1 year ago

Another example of Temporal series with overlap, i.e. version series could be versioned controlled vocabularies.

For instance, the NUTS regions - there are versions 2003, 2006, 2010, 2013, 2016, 2021. The versions have changes that are not only additive. Some regions are merged, some are split, some are cancelled etc., and some others are the same in all the versions.

This could be viewed as a DatasetSeries (NUTS) with the individual versions represented as Datasets.

The question is, is this distinct enough from a temporal series? I think it is, based on the overlap.

init-dcat-ap-de commented 11 months ago

As far as I can see, there is no solution in the current version of DCAT-AP 3.0.

bertvannuffelen commented 5 months ago

We propose to close this issue as during de past months it has become clear that there should be more evidence to support the needs for distinctions. And what these could be.

SEMICeu / DCAT-AP

dcat:DataSeries: Controlled Vocabulary For dcterms:type #249