SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

dcat:DataSeries: Controlled Vocabulary For dcterms:type #249

Closed init-dcat-ap-de closed 2 weeks ago

init-dcat-ap-de commented 1 year ago

As discussed in the last webinar and here (https://github.com/SEMICeu/DCAT-AP/issues/240#issuecomment-1327166342) this issue is to discuss the needed entries for a controlled vocabulary that could be used with dct:type, similar to dataset types.

To me are currently the following values known:

jakubklimek commented 1 year ago

I suggest that we accompany each type with an example, to avoid cases where each reader interprets e.g. "temporal series" differently, and to make sure that everyone sees their series in the list. We could take this opportunity to specify the series types a bit further. E.g.

Temporal series with no overlap comprises datasets: Budget 2020, Budget 2021, Budgets 2022 where e.g. no part of Budget 2020 is part of Budget 2021

Temporal series with overlap - this can be also viewed as versions series, or snapshots - where we have: Blacklist of companies not paying their taxes in March, April, May - some companies may be on the list in various months, some not, and comparison between versions may be meaningful, e.g. to say for how long a company was on the blacklist

Then there is the question of what happens if we combine spatial and temporal series, e.g., budgets of cities in time. Is there a preference, recommendation or is it up to the publishers?

  1. Flat spatial-temporal series with datasets with various spatial and temporal coverages (this could be an additional type)
  2. Spatial series of individual temporal series
  3. Temporal series of individual spatial series

Regarding the topic series, my examples would be, for the "budget" topic:

  1. "Budget of a company" dataset (distributed as XML and CSV, i.e. 2 distributions)
  2. N datasets with "codelists used in budget of a company", distributed as SKOS concept schemes in RDF
  3. "Budget and codelists" dataset distributed as a data service, e.g. SPARQL endpoint

Another example could be the topic "financial datasets", grouping in a series:

  1. Budget dataset
  2. Spending dataset
  3. Budget outlook dataset

I am stating the examples rather to spark discussion (possibly identify distinct issues), rather than saying I am convinced they are the correct usage of DatasetSeries.

H-a-g-L commented 1 year ago

I would suggest adding the values to the dataset-type List.

IMHO, the sub-division of the series type may be excessive and would therefore suggest the following solution:

The specific type can be added as a separate triple. For example:

<datasetSeries/cropMapping> a dcat:DatasetSeries ;
    dct:title "Crop Mapping Service" ;
    dct:type <http://publications.europa.eu/resource/authority/dataset-type/DATASET_SERIES> ;
    dct:type <http://publications.europa.eu/resource/authority/dataset-type/GEOSPATIAL>.

<dataset/cropMappingBrasil> a dcat:Dataset
    dcat:inSeries <datasetseries/cropMapping> 

If, however, you feel that the sub-types of the series necessary, I could ask OP to add the following Concepts:

(The examples are derived from https://github.com/w3c/dxwg/wiki/Examples-on-dataset-series)

Personally I find this option less appealing because it is very restrictive. For instance, would a new concept be required to describe a series that differs in both spatial and temporal terms (budget for several years of various geographical areas)?

init-dcat-ap-de commented 1 year ago

I think, the use of the dcat:DatasetSeries class already says, that something is a generic dataset series.

The use of a "type" should probably only be recommended (or we need a catch-all type, maybe every series is at least thematic?). Data portals could use the type to offer a specialized visualization of the dataset series.

H-a-g-L commented 1 year ago

Thanks for your comment @init-dcat-ap-de. Indeed the rdfs:type already denotes the "general" classification and a "general" dct:type would be redundant. However, I agree that providing an explanation of why certain datasets are grouped in a series is useful.

In my understanding we should then consider:

  1. If this could be done by adding Concepts to the dataset-type List (@MPaunescu), and
  2. which types of dataset-series to include (for me spatial, temporal and thematic should cover most cases).

I am pasting a more relaxed definition of the sub-types, to allow inclusion of less structured series:

init-dcat-ap-de commented 1 year ago

Since DatasetSeries are a subtype of Datasets, should their type just be added to the dataset-type-vocabulary? Or should there be an additional vocabulary?

bertvannuffelen commented 1 year ago

@init-dcat-ap-de these subclassing is a tricky thing. It does not mean that a current DCAT-AP dataset is a superclass of a DCAT-AP dataset Series. It still could be a distinct group.

Note: This is currently the practice because a DCAT-AP catalogue is not a DCAT-AP dataset, because the semantics of the properties are different. Personally I would not have introduced the subclassing of dataset to catalogue and datasetseries in DCAT.

jakubklimek commented 1 year ago

Since DatasetSeries are a subtype of Datasets, should their type just be added to the dataset-type-vocabulary? Or should there be an additional vocabulary?

@init-dcat-ap-de I prefer a separate vocabulary to avoid (or at least discourage) the usage of the items such as "temporal series" with a dataset with multiple distributions as in https://github.com/SEMICeu/DCAT-AP/issues/240#issuecomment-1413923165

bertvannuffelen commented 1 year ago

On the different kinds of dataset-series: Can we provide concrete examples of each case? I know examples of the temporal case, but for the others I have no good real-life example yet.

jakubklimek commented 1 year ago

@bertvannuffelen

bertvannuffelen commented 1 year ago

@jakubklimek thanks: it is good to have concrete examples.

jakubklimek commented 1 year ago

Another example of Temporal series with overlap, i.e. version series could be versioned controlled vocabularies.

For instance, the NUTS regions - there are versions 2003, 2006, 2010, 2013, 2016, 2021. The versions have changes that are not only additive. Some regions are merged, some are split, some are cancelled etc., and some others are the same in all the versions.

This could be viewed as a DatasetSeries (NUTS) with the individual versions represented as Datasets.

The question is, is this distinct enough from a temporal series? I think it is, based on the overlap.

init-dcat-ap-de commented 11 months ago

As far as I can see, there is no solution in the current version of DCAT-AP 3.0.

bertvannuffelen commented 5 months ago

We propose to close this issue as during de past months it has become clear that there should be more evidence to support the needs for distinctions. And what these could be.