ckan / ckanext-dcat

CKAN ♥ DCAT
163 stars 142 forks source link

Source Catalog info in transitive harvesting #96

Closed etj closed 6 years ago

etj commented 6 years ago

In transitive harvesting, source Catalog info is lost.

E.g.: Let's say node A and node B are two nodes in which datasets are created. B harvests from A, so B contains datasets created locally, plus datasets created by A.

We then have a node C that harvests B. How can C find out which datasets belong to (i.e. were originally created on) B, and which datasets belong to A? If we rely on the Catalog node only, it will seem that ALL datasets belong to catalog B, which isn't the case. So we should associate a new node to the dataset, that tells which is its original source catalog.

A solution is about using the dct:hasPart property, which expresses in pure DCAT semantic this requirement.

Let's say

C1 exposes a catalog.rdf as:

C2 harvests from C1. The related RDF could be expressed as:

(Original discussion was in geosolutions-it/ckanext-dcatapit#52, but ckanext-dcat is better suited for implementing this improvement.)