SEMICeu / GeoDCAT-AP

Repository of the geospatial extension to DCAT-AP (GeoDCAT-AP)
https://joinup.ec.europa.eu/solution/geodcat-application-profile-data-portals-europe
Creative Commons Attribution 4.0 International
17 stars 6 forks source link

Usage of Dataset Series in INSPIRE community #79

Open jakubklimek opened 4 months ago

jakubklimek commented 4 months ago

As part of the GeoDCAT-AP alignment with DCAT 3 and DCAT-AP 3, Dataset Series is being investigated. In the open data (DCAT-AP) community, Dataset Series is being used just as a grouping element for Datasets, which hold majority of the relevant metadata, and therefore only few metadata options for Dataset Series are available in DCAT-AP.

However, in the Technical Guidance on Implementation of INSPIRE Metadata, a Dataset Series can have the same metadata description as a dataset.

In order to keep the GeoDCAT-AP specification concise and to avoid unnecessary mappings, we are investigating, how Dataset Series is actually being used in the INSPIRE community. Whether it is too being used just as a group of datasets, or whether the full range of available metadata is being used meaningfully.

We therefore ask the INSPIRE community to contribute their experience with usage of Dataset Series metadata in this issue. Please state whether you use Dataset Series just as a group of datasets, or, if not, which metadata elements are being used and with what meaning to the Series and to the contained datasets.

GDIAnja commented 4 months ago

Actually, in Lower Saxony (North Western Germany) we don't differentiate between dataset and series (hierarchyLevel in ISO). Originally, series was intended to describe time series of data, but here too there were always different opinions. Today in Lower Saxony we have 80 records series as opposed to 26712 records dataset (0.3%). In Germany, the ratio is 3396 to 226250 (1.5%) by the end of december last year. Metadata records with the hierarchyLevel series contain the same fields as metadata with the hierarchyLevel dataset. The difference is just in the hierarchyLevel.

If it comes to linking several metadata records (dataset, series) between each other, we have an element parentIdentifier in child metadata. The parent metadata record shows the parentIdentifier as fileIdentifier. The parentIdentifier is not an idea of INSPIRE, but is in ISO and it can help to organize metadata.

People use the parentIdentifier to show that it can refer to the entirety of their data. This is done for services, but also for dataset/series. On the one hand, the parentIdentifier has a grouping function for the data-holding entity, but on the other hand, the function can also be used to sort certain frequently existing data into groups.

From my point of view, the parentIdentifier is much more important than the division into dataset/series.

But: In Lower Saxony we do not have any sensor data described so far. Maybe series is important for sensor data and up to now we are only using it in a wrong way. Who may know?!

Kate-Lyndegaard commented 4 months ago

We serve a large number of INSPIRE-compliant dataset series, primarily for regional governments reporting data that falls within the INSPIRE Planned Land Use theme. According to the INSPIRE registry, the definition of a dataset series is "a collection of spatial data sets sharing the same product specification." For our users, dataset series support efficient data management, enabling municipalities to collect and report their spatial planning data in a consolidated way.

In addition to @GDIAnja's important points on the hierarchyLevel and parentIdentifier, we also support TG Requirement 1.9: metadata/2.0/req/datasets-and-series/one-data-quality-element in our dataset series metadata, where we include a gmd:scope/gmd:DQ_Scope/gmd:level/gmd:MD_ScopeCode element referring to the value "series" of the ISO 19139 code list MD_ScopeCode.

If you require examples of INSPIRE-compliant dataset series metadata and the metadata of contained child datasets, please let me know. I can provide examples upon request.

jakubklimek commented 3 months ago

@GDIAnja @Kate-Lyndegaard thank you for your feedback.

From @GDIAnja I take that in Lower Saxony, some recordes are marked as datasets, some are marked as dataset series, but other than that, they are treated equally. I will therefore refer to them as datasets/series. There are no requirements such as "a dataset series must be linked to datasets in that series", etc. Independently of that, you have some datasets/series, but also services linked to others using parentIdentifier, forming a hierarchy of datasets/series/services independent of the exact type of each member of that hierarchy. You use this as the grouping element.

@Kate-Lyndegaard in your case, is there a requirement about a dataset series needing to be linked to datasets in that series? Please, if you have relevant examples, share them with us, e.g. as attachments to the issue comments here.

Kate-Lyndegaard commented 3 months ago

Hi @jakubklimek,

No, we don't have a requirement that a series is linked to its child datasets. The linkage is based on the parentIdentifier in the child dataset. I have attached an example of series and child dataset metadata.

series_and_child_dataset_metadata_examples.zip

hallinpihlatie commented 2 months ago

Here you can find published examples, where the data provider has chosen to use "Series" instead of "Dataset".

That is examples of:

They share the same information as a Dataset metadata. Not sure if it matters, but we have the same apporach, where we have a description of a data management system (parent) and from that links to dataset metadata (as children) describing map layers of that data source. To pass the INSPIRE validation the gmd:identifier cannot be identical, for example For parent: http://paikkatiedot.fi/so/1000040/velho For child: http://paikkatiedot.fi/so/1000040/aidat Here's an [example](https://www.paikkatietohakemisto.fi/geonetwork/srv/eng/catalog.search#/search?facet.q=type%2Fseries&resultType=details&sortBy=relevance&fast=index&_content_type=json&from=1&to=20)
hallinpihlatie commented 2 months ago

However, in the later parent/child example both the parent and the child have been implemented as Dataset metadata.