When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.
The root cause is the the harvest_object_id does not change with the new harvest_object_id.
This was confirmed through the following API calls:
/api/action/package_show?id=
/api/action/package_search?q=id:
Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.
Observations from Testing:
Manually run ckan search-index rebuild <package_id> resolved the issue, as the above API calls return correct value of harvest_object_id.
Testing with the following code changes yielded positive results:
Invoking package_update instead of package_index.index_package resolved the issue.
OR
Addition of model.Session.commit() before invoking package_index.index_package also resolved the issue.
OR
calling rebuild index instead of package_index.index_package does not solve the issue unless model.Session.commit() was called before invoking the rebuild.
It seems that the assumption that package_index.index_package doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.
related issue: https://github.com/GSA/data.gov/issues/4505
Summary:
When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.
The root cause is the the harvest_object_id does not change with the new harvest_object_id. This was confirmed through the following API calls: /api/action/package_show?id=
/api/action/package_search?q=id:
Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.
Observations from Testing:
Manually run
ckan search-index rebuild <package_id>
resolved the issue, as the above API calls return correct value of harvest_object_id.Found the code block which should refresh the solr index: https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70
Testing with the following code changes yielded positive results: Invoking
package_update
instead ofpackage_index.index_package
resolved the issue. OR Addition ofmodel.Session.commit()
before invoking package_index.index_package also resolved the issue. OR callingrebuild
index instead of package_index.index_package does not solve the issue unlessmodel.Session.commit()
was called before invoking therebuild
.It seems that the assumption that
package_index.index_package
doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.Any alternative solutions to address this issue?