ckan / ckanext-dcat

CKAN ♥ DCAT
https://docs.ckan.org/projects/ckanext-dcat
165 stars 145 forks source link

Produce Sitemap as well #137

Open jqnatividad opened 6 years ago

jqnatividad commented 6 years ago

To help Google Dataset Search index the catalog

https://developers.google.com/search/docs/data-types/dataset#sitemap

see https://github.com/ckan/ideas-and-roadmap/issues/220

TkTech commented 6 years ago

Sitemaps have many uses outside of google dataset search. This should probably belong in core rather then ckanext-dcat, with an interface for extensions to:

  1. Modify individual items as they're added to the sitemap
  2. Append entirely new items.
  3. Replace/stop generation completely.

If we have CKAN create a dataset for the sitemap, and upload the sitemap index files via the normal resource mechanism then you would get S3/azure/etc... for "free" when using ckanext-cloudstorage along with activity tracking and everything else.

See previous work by data.gov https://github.com/GSA/data.gov/issues/769, https://github.com/GSA/data.gov/issues/798

jqnatividad commented 6 years ago

It might still be useful to implement a provenance-extended sitemap version in ckanext-dcat, for populating properties like sameAs, isBasedOn, etc. in the sitemap file so Google can use this info to identify the canonical dataset in its graph.

https://developers.google.com/search/docs/data-types/dataset#source-provenance