GeoNode / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
https://geonode.org/
Other
1.45k stars 1.13k forks source link

Metadata Dates #2187

Closed pjdufour closed 9 years ago

pjdufour commented 9 years ago

Right now, ResourceBase objects only use one date field to reference the "metadata date" and the "data date" in ISO metadata (https://github.com/GeoNode/geonode/blob/master/geonode/catalogue/templates/catalogue/full_metadata.xml#L81). The metadata date (or gmd:dateStamp/gco:DateTime) should represent when the metadata was last changed. The data date (https://github.com/GeoNode/geonode/blob/master/geonode/catalogue/templates/catalogue/full_metadata.xml#L118) should represent what time period the data in the dataset covers.

The metadata date is used by CKAN to decide whether to re-harvest a dataset. If the metadata date and content hasn't changed then it doesn't reharvest. See below for relevant code.

We should have an additional field (hidden from user) that tracks when the metadata was last changed otherwise CKAN won't reharvest datasets when only the metadata changes. Since this will require a model change, we should hold for 2.5.x. Related to #1125, Hotfix below to enable CKAN harvest.

Metadata Date https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/model/harvested_metadata.py#L513

Data date https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/model/harvested_metadata.py#L544

Hotfix Change logic to have gmd:DateStamp/gco:DateTime set to the current date, so user can "flash" ISO XML when they save.

tomkralidis commented 9 years ago

To clarify, ISO has 3 notions of 'date' that we apply in GeoNode:

Item Element Description GeoNode binding
1 //gmd:dateStamp date that the metadata was created layer.date
2 /gmd:identificationInfo//gmd:date reference date for the cited resource layer.date && layer.date_type
3 //gmd:temporalElement time period of the data layer.temporal_extent_start, layer.temporal_extent_end

So imagine item 2 being a date related to citation. Item 2 is where we need the saved date checkpoint to be available.

Sounds like we need a datestamp attribute in ResourceBase to cover off Item 1 and updated workflow.

Workflow proposal:

This applies to the generated XML use case. For layers that have their own XML uploaded by the user, continue the existing approach where the XML is untouched / as is and the Item values are derived from the uploaded XML.

Thoughts?

capooti commented 9 years ago

+1 @tomkralidis do we need to keep track of all the subsequent saves or just the last one? as suggested by @pjdufour we can do this schema changes for the 2.5 milestone.

tomkralidis commented 9 years ago

@capooti just the last save

tomkralidis commented 9 years ago

Deeper inspection shows that the user explicitly controls the proposed workflow via metadata form update (i.e. can force date_type). I think we can leave this as is.

ResourceBase has a csw_insert_date attribute which fits Item 1. The patch below ensures that whenever metadata is saved, gmd:dateStamp is updated:

diff --git a/geonode/base/models.py b/geonode/base/models.py
index 8524eec..51f4b12 100644
--- a/geonode/base/models.py
+++ b/geonode/base/models.py
@@ -715,7 +715,8 @@ def resourcebase_post_save(instance, *args, **kwargs):
     """
     ResourceBase.objects.filter(id=instance.id).update(
         thumbnail_url=instance.get_thumbnail_url(),
-        detail_url=instance.get_absolute_url())
+        detail_url=instance.get_absolute_url(),
+        csw_insert_date=datetime.datetime.now())
     instance.set_missing_info()

     # we need to remove stale links
diff --git a/geonode/catalogue/templates/catalogue/full_metadata.xml b/geonode/catalogue/templates/catalogue/full_metadata.xml
index 02b7e97..517088e 100644
--- a/geonode/catalogue/templates/catalogue/full_metadata.xml
+++ b/geonode/catalogue/templates/catalogue/full_metadata.xml
@@ -79,7 +79,7 @@
      </gmd:CI_ResponsibleParty>
    </gmd:contact> {% endwith %}
    <gmd:dateStamp>
-     <gco:DateTime>{{layer.date|date:"Y-m-d\TH:i:s\Z"}}</gco:DateTime>
+     <gco:DateTime>{{layer.csw_insert_date|date:"Y-m-d\TH:i:s\Z"}}</gco:DateTime>
    </gmd:dateStamp>
    <gmd:metadataStandardName>
      <gco:CharacterString>ISO 19115:2003 - Geographic information - Metadata</gco:CharacterString>

@pjdufour can you test and verify this is acceptable/works for you? We don't need a schema change.

@capooti / @pjdufour minor note: we could modify https://github.com/GeoNode/geonode/blob/master/geonode/base/models.py#L313 to be null=False, which forces this field to always be present, but this would be a schema change, or is the signal good enough?

tomkralidis commented 9 years ago

Applied to master 8a9a879b7dd992d59d2e2c1e0196a154609df05d

pjdufour commented 9 years ago

This seems fine (very similar to my rough hotfix). It'll take me a few days to find time to test.

pjdufour commented 9 years ago

In regards to the 3rd notion of a date, I'm getting the following harvest error raised on the CKAN side for the ISO XML:

<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:rim="urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dct="http://purl.org/dc/terms/" xmlns:ows="http://www.opengis.net/ows" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:gml="http://www.opengis.net/gml" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:ogc="http://www.opengis.net/ogc" xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm" xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:wrs="http://www.opengis.net/cat/wrs/1.0" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/gmd.xsd">

...
<gml:TimePeriod gml:id="T_01">
<gml:beginPosition>2015-06-11T12:00:00</gml:beginPosition>
<gml:endPosition>2015-06-11T12:00:00</gml:endPosition>
</gml:TimePeriod>
...

The Error Element '{http://www.opengis.net/gml}TimePeriod': This element is not expected. Expected is one of ( {http://www.opengis.net/gml/3.2}AbstractTimePrimitive, {http://www.opengis.net/gml/3.2}TimeInstant, {http://www.opengis.net/gml/3.2}TimePeriod, {http://www.opengis.net/gml/3.2}TimeNode, {http://www.opengis.net/gml/3.2}TimeEdge ).

This link might provide some guidance: https://geo-ide.noaa.gov/wiki/index.php?title=Validation_Error_Guidance#Expected_content_from_http:.2F.2Fwww.opengis.net.2Fgml.2F3.2

However, I'm not sure if the root of the error is on the GeoNode or CKAN side. The article says:

invalid:
xmlns:gml="http://www.opengis.net/gml"

valid:
xmlns:gml="http://www.opengis.net/gml/3.2"

I'm not sure of the overall impact of making changes as the article suggests.

tomkralidis commented 9 years ago

@pjdufour GeoNode's ISO XML output is the generic implementation as per http://www.isotc211.org/2005/.

IMHO the best approach would be to patch CKAN to support both gml namespaces one might encounter in an ISO document. I've issued a pull request against ckanext-spatial at https://github.com/ckan/ckanext-spatial/pull/109

pjdufour commented 9 years ago

Great!

pjdufour commented 9 years ago

@tomkralidis, PR 109 (https://github.com/ckan/ckanext-spatial/pull/109) was merged, so can this be closed now?