ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Refactor code #501

Open tino097 opened 2 years ago

tino097 commented 2 years ago

With some of the current changes in core and moving to SOLR 8, we should reconsider some rework on this extension. There are few issues that I would point as concerns:

  1. Creating of the HarvestObject for each dataset. - This is executing at the end of gather stage and if there is significant number of datasets e.g 100k+ it could lead performance issues. My suggestion is to create internal method _create_harvest_object which could be called on every package_search itteration.
  2. Deleting the deleted packages from source. - As it was mention in the Ian's comment, we could use the recently_changed_packages_activity_list API to get the packages for re-harvesting.
  3. Adding the harvesters tab to ckan admin page. https://github.com/ckan/ckanext-harvest/pull/500

    @ckan/core