cioos-siooc / ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
http://ckan.org/
Other
2 stars 4 forks source link

changed waf xml not updating during ckan harvest #78

Closed fostermh closed 3 years ago

fostermh commented 3 years ago

re-harvesting of WAF metadata does not update a dataset even though it has been changed on the waf. Perhaps an issue with how waf_modified_date is populated?

reportedby Étienne.

fostermh commented 3 years ago

this Line 752 in the spatial harvester base checkes metadata_date of old and new harvested metadatra when doing an update. metadata_date is populated from "metadata-date" in spatial model harvester Line 713.

Currently metadata-date is populated by the following xpaths:

"mdb:dateInfo/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode/@codeListValue='creation']/cit:date/gco:Date/text()",
"mdb:dateInfo/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode/text()='creation']/cit:date/gco:Date/text()",
"mdb:dateInfo/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode/@codeListValue='creation']/cit:date/gco:DateTime/text()",
"mdb:dateInfo/cit:CI_Date[cit:dateType/cit:CI_DateTypeCode/text()='creation']/cit:date/gco:DateTime/text()",

This does not take into account publication or revision dates. It probably should.

The current setup of the harvester requires a change to the dates in the metadata before a record will be updated. This is likely good metadata practice but may not be convenient for some workflows. More discussion needed.

fostermh commented 3 years ago

update the metadata-entry-form now updates the lastUpdated data every time the form is saved so this is no longer an issue with metadata generated from the form. Correct metadata would be to update this date when changing the xml anyway so I think this is the appropriate workflow.