ckan / ckanext-dcat

CKAN ♥ DCAT
165 stars 144 forks source link

CKAN & Linked Open Data #72

Closed Polymathronic closed 7 years ago

Polymathronic commented 7 years ago

My feeling is the current CKAN support for DCAT-AP is not entirely in line with the best practices for publishing Linked Data, as (please correct me if I'm wrong) CKAN assigns its own identifiers to harvested DCAT-AP catalog descriptions. The problem, in my humble opinion, lies in CKAN usually being seen as the principal catalog management endpoint. In the DCAT-AP world, this is not the case.

In the LOD world, the publisher/owner of a catalog has full control over a catalog. This includes choosing and maintaining the namespace behind it.

Now, Linked Data is not merely RDF. Linked Data revolves around identifiers with meaning. If a publisher decides to have a specific URI point to their description of a resource, it is up to them to choose what to provide behind that URI. If we just overwrite those URIs with different identifiers without providing any provenance information, the way CKAN currently does, we are just duplicating existing data, and that is definitely not what Linked Data is about.

Is there any reason as to why the original URIs can't be preserved?

amercader commented 7 years ago

Given the choice, CKAN will never overwrite any URI for a resource. But if he isn't provided with one, it will need to come up with one.

If publishers want to provide an URI for their datasets, resources or organizations they can absolutely provide an uri extra (or uri top level field if using a custom schema) and that will be used in any serialization. If a dataset was imported via the RDF harvester and the remote source provided URIs for their resources, this same URI will be stored and used in any serialization. If publishers want to provide an URI for the catalog per se and use that as a base for other URIs, they can set the ckanext.dcat.base_uri config option.

In most cases though that is not the case and there are no explicit URIs defined or imported from other sources, like in the demo site you are linking to. In that case rather than not providing any URIs at all, CKAN will generate ones for you, but again not if an original one is available.

Of course that is how it's intended to work and there might be a bug somewhere, or might have misunderstood the issue.

What are the URIs that are not preserved in your view?

Polymathronic commented 7 years ago

Thanks for the prompt response Adrià. Apologies if that is indeed the case; after seeing all the URIs in the demo catalog, datahub.io, but also those provided by the European Data Portal (which uses a different extension, if I'm not mistaken), and not finding a single URI with a different namespace, I was under the impression that the original identifiers were being overwritten by CKAN.

amercader commented 7 years ago

Neither demo.ckan.org or datahub.io were designed to publish linked data (eg define and publish URIs), ckanext-dcat was enabled afterwards to provide an RDF interface. On portals like the Swedish one which are based on DCAT-AP sources you can see it uses the original URIs: http://oppnadata.se/catalog.ttl

I can't comment on the European Data Portal because as you say it uses a different extension and I'm not familiar with the workflow they use.

Glad that helped clarify things!