ckan / ckanext-spatial

Geospatial extension for CKAN
http://docs.ckan.org/projects/ckanext-spatial
126 stars 193 forks source link

Dataset Loses Harvest Object on WAF file Timestamp Change #324

Open Jin-Sun-tts opened 7 months ago

Jin-Sun-tts commented 7 months ago

related issue: https://github.com/GSA/data.gov/issues/4505

Summary:

When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.

The root cause is the the harvest_object_id does not change with the new harvest_object_id. This was confirmed through the following API calls: /api/action/package_show?id= /api/action/package_search?q=id:

Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.

Observations from Testing:

  1. Manually run ckan search-index rebuild <package_id> resolved the issue, as the above API calls return correct value of harvest_object_id.

  2. Found the code block which should refresh the solr index: https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70

    Testing with the following code changes yielded positive results: Invoking package_update instead of package_index.index_package resolved the issue. OR Addition of model.Session.commit() before invoking package_index.index_package also resolved the issue. OR calling rebuild index instead of package_index.index_package does not solve the issue unless model.Session.commit() was called before invoking the rebuild.

It seems that the assumption that package_index.index_package doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.

Any alternative solutions to address this issue?