Closed pjdufour closed 9 years ago
To clarify, ISO has 3 notions of 'date' that we apply in GeoNode:
Item | Element | Description | GeoNode binding |
---|---|---|---|
1 | //gmd:dateStamp |
date that the metadata was created | layer.date |
2 | /gmd:identificationInfo//gmd:date |
reference date for the cited resource | layer.date && layer.date_type |
3 | //gmd:temporalElement |
time period of the data | layer.temporal_extent_start , layer.temporal_extent_end |
So imagine item 2 being a date related to citation. Item 2 is where we need the saved date checkpoint to be available.
Sounds like we need a datestamp
attribute in ResourceBase
to cover off Item 1 and updated workflow.
Workflow proposal:
datestamp
(happens once and only once in the lifetime of the metadata)date
set. If 'public', then date_type='publication'
, else date_type='creation'
date
and date_type='publication'
date
and date_type='revision'
This applies to the generated XML use case. For layers that have their own XML uploaded by the user, continue the existing approach where the XML is untouched / as is and the Item values are derived from the uploaded XML.
Thoughts?
+1 @tomkralidis do we need to keep track of all the subsequent saves or just the last one? as suggested by @pjdufour we can do this schema changes for the 2.5 milestone.
@capooti just the last save
Deeper inspection shows that the user explicitly controls the proposed workflow via metadata form update (i.e. can force date_type
). I think we can leave this as is.
ResourceBase
has a csw_insert_date
attribute which fits Item 1. The patch below ensures that whenever metadata is saved, gmd:dateStamp
is updated:
diff --git a/geonode/base/models.py b/geonode/base/models.py
index 8524eec..51f4b12 100644
--- a/geonode/base/models.py
+++ b/geonode/base/models.py
@@ -715,7 +715,8 @@ def resourcebase_post_save(instance, *args, **kwargs):
"""
ResourceBase.objects.filter(id=instance.id).update(
thumbnail_url=instance.get_thumbnail_url(),
- detail_url=instance.get_absolute_url())
+ detail_url=instance.get_absolute_url(),
+ csw_insert_date=datetime.datetime.now())
instance.set_missing_info()
# we need to remove stale links
diff --git a/geonode/catalogue/templates/catalogue/full_metadata.xml b/geonode/catalogue/templates/catalogue/full_metadata.xml
index 02b7e97..517088e 100644
--- a/geonode/catalogue/templates/catalogue/full_metadata.xml
+++ b/geonode/catalogue/templates/catalogue/full_metadata.xml
@@ -79,7 +79,7 @@
</gmd:CI_ResponsibleParty>
</gmd:contact> {% endwith %}
<gmd:dateStamp>
- <gco:DateTime>{{layer.date|date:"Y-m-d\TH:i:s\Z"}}</gco:DateTime>
+ <gco:DateTime>{{layer.csw_insert_date|date:"Y-m-d\TH:i:s\Z"}}</gco:DateTime>
</gmd:dateStamp>
<gmd:metadataStandardName>
<gco:CharacterString>ISO 19115:2003 - Geographic information - Metadata</gco:CharacterString>
@pjdufour can you test and verify this is acceptable/works for you? We don't need a schema change.
@capooti / @pjdufour minor note: we could modify https://github.com/GeoNode/geonode/blob/master/geonode/base/models.py#L313 to be null=False
, which forces this field to always be present, but this would be a schema change, or is the signal good enough?
Applied to master 8a9a879b7dd992d59d2e2c1e0196a154609df05d
This seems fine (very similar to my rough hotfix). It'll take me a few days to find time to test.
In regards to the 3rd notion of a date, I'm getting the following harvest error raised on the CKAN side for the ISO XML:
<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:rim="urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dct="http://purl.org/dc/terms/" xmlns:ows="http://www.opengis.net/ows" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:gml="http://www.opengis.net/gml" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:ogc="http://www.opengis.net/ogc" xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm" xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:wrs="http://www.opengis.net/cat/wrs/1.0" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/gmd.xsd">
...
<gml:TimePeriod gml:id="T_01">
<gml:beginPosition>2015-06-11T12:00:00</gml:beginPosition>
<gml:endPosition>2015-06-11T12:00:00</gml:endPosition>
</gml:TimePeriod>
...
The Error Element '{http://www.opengis.net/gml}TimePeriod': This element is not expected. Expected is one of ( {http://www.opengis.net/gml/3.2}AbstractTimePrimitive, {http://www.opengis.net/gml/3.2}TimeInstant, {http://www.opengis.net/gml/3.2}TimePeriod, {http://www.opengis.net/gml/3.2}TimeNode, {http://www.opengis.net/gml/3.2}TimeEdge ).
This link might provide some guidance: https://geo-ide.noaa.gov/wiki/index.php?title=Validation_Error_Guidance#Expected_content_from_http:.2F.2Fwww.opengis.net.2Fgml.2F3.2
However, I'm not sure if the root of the error is on the GeoNode or CKAN side. The article says:
invalid:
xmlns:gml="http://www.opengis.net/gml"
valid:
xmlns:gml="http://www.opengis.net/gml/3.2"
I'm not sure of the overall impact of making changes as the article suggests.
@pjdufour GeoNode's ISO XML output is the generic implementation as per http://www.isotc211.org/2005/.
IMHO the best approach would be to patch CKAN to support both gml namespaces one might encounter in an ISO document. I've issued a pull request against ckanext-spatial at https://github.com/ckan/ckanext-spatial/pull/109
Great!
@tomkralidis, PR 109 (https://github.com/ckan/ckanext-spatial/pull/109) was merged, so can this be closed now?
Right now, ResourceBase objects only use one date field to reference the "metadata date" and the "data date" in ISO metadata (https://github.com/GeoNode/geonode/blob/master/geonode/catalogue/templates/catalogue/full_metadata.xml#L81). The metadata date (or gmd:dateStamp/gco:DateTime) should represent when the metadata was last changed. The data date (https://github.com/GeoNode/geonode/blob/master/geonode/catalogue/templates/catalogue/full_metadata.xml#L118) should represent what time period the data in the dataset covers.
The metadata date is used by CKAN to decide whether to re-harvest a dataset. If the metadata date and content hasn't changed then it doesn't reharvest. See below for relevant code.
We should have an additional field (hidden from user) that tracks when the metadata was last changed otherwise CKAN won't reharvest datasets when only the metadata changes. Since this will require a model change, we should hold for 2.5.x. Related to #1125, Hotfix below to enable CKAN harvest.
Metadata Date https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/model/harvested_metadata.py#L513
Data date https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/model/harvested_metadata.py#L544
Hotfix Change logic to have gmd:DateStamp/gco:DateTime set to the current date, so user can "flash" ISO XML when they save.