Esri / geoportal-server

Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
https://gptogc.esri.com/geoportal
Apache License 2.0
244 stars 149 forks source link

UPDATEDATE Field in GPT_RESOURCE table not getting updated on Full Synchronization (v 1.2.7) #268

Open rmbradley opened 7 years ago

rmbradley commented 7 years ago

This may not be an issue but may be a misunderstanding of how the UPDATEDATE field works in the context of harvesting records. When we execute a full synchronization on a set of existing records for ReST-based map services we are expecting the UPDATEDATE field for all records to reflect the current date due to the Map services being either "new" or "refreshed" (at least re-created with no changes). We do see some records that show a new date, but we have not made any updates to the underlying map service

We are seeing the same issue with XML metadata files from a Web Accessible Folder (WAF), but we are able to modify the XML content (although we don't want to) to get the full synchronization to trigger the UPDATEDATE field to record the current date.

We found that physically modifying an XML metadata file results in a modified UPDATEDATE field after a re-harvest. (Even one character or a space character works). I suppose this stands to reason since Geoportal somehow knows the record content has changed. If we wish to see the current date for all records, we could delete the geoportal records representing the XMLs or Map Services to basically trigger a new full synchronization - but that has other issues, like collection reassignment and database IDs, that we would prefer to avoid.

This triggers a question about how Geoportal identifies an underlying data source that has been updated, or if the full synchronization is actually supposed to record an update of ALL records or only just those that actually changed. For XML files, it seems like it is relying on the O/S "file modified date" to check for this, but for ReST-based map service records in geoportal we are unable to find any type of update to the map service that would trigger a new UPDATEDATE value. How does it detect updates to mapservices?

Can you verify what we are seeing is normal behavior? We think, but don't know for certain, that under 1.2.5 a reharvest was updating the UPDATEDATE field after a "re-fresh" in the map service scenario described above.

Thanks on behalf of the BLM Landscape Approach Data Portal Team

Rick

zguo commented 7 years ago

the full synchronization will check if the metadata is still there and if the metadata was updated, if metadata is there but no update, then the update date will not change, if metadata has been updated, then the updatedate will be the new date. for services I believe (have to confirm with Piotr) it compares the text used for assembling the metadata to see if anything different, if same, no change to updatedate, if different then update will be new date.

DanelleM commented 6 years ago

I think where my confusion lies in the apparent different way that the term "Update" is used in the harvest report and the Manage Records view and a difference of opinion on how the function a Full vs Incremental synchronization should behave. The harvest/synchronization report will state that all records have been updated using the term "DocsUpdated", but when you review those records in the Manage Records view, only a few of those records reflect the date of the harvest in the "Date" column. We use a full synchronization exclusively.

So the harvest report says 10 records were updated, but a new date (11/27 - date of harvest) only shows for 6 records in the manage interface - those that actually had a content change in the underlying file. I recognize that only the term "Date" is used in the Manage Records interface but the underlying database field that populates the Date is called UpdateDate. I might expect an incremental sync to reflect only actual changes, but a Full Sync to grab everything regardless of underlying updates. Perhaps I am the only one confused by this, but if a different term appeared in the harvest report - maybe "DocsConfirmed" - that more accurately reflected the function that is happening, or the count in DocsUpdated reflect only those records that have actually had a change captured.

My group has made a direct request to staff to identify the location of the code that contains the harvest logic to see if we can customize this to meet our needs. The logic of only updating when changes are detected is affecting our ability to force an update of all our records to correct an index issue we have discovered (metadata link #270 ). We have to make a trivial modification to our xmls, and completely republish map services, to get them to be reharvested.

image