ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Remove old entries in harvest_job_table #480

Open frafra opened 2 years ago

frafra commented 2 years ago

clean-harvest-log removes old logs, but I have not found anything similar to clean harvest_job_table. Having automatic harvesting means ending up with a forever-growing, non paginated/non-searchable job page, which seems pretty useless to me.

What about keeping the last 100 jobs by default and/or having a clean-up procedure? Like clean-harvest-jobs.

seitenbau-govdata commented 2 years ago

I think there is already a command clear-history https://github.com/ckan/ckanext-harvest/blob/d84d847b09f28ab97bf1ca0baa651fdc05693d03/ckanext/harvest/cli.py#L112 or rather clearsource_history https://github.com/ckan/ckanext-harvest/blob/d84d847b09f28ab97bf1ca0baa651fdc05693d03/ckanext/harvest/commands/harvester.py#L220 for deleting the old harvest jobs and the related objects. But at the moment the command still deletes all old and current harvest jobs and harvest objects. But we already working on a pull request with an updated version which keeps at least the running harvest jobs and latest harvest objects.

seitenbau-govdata commented 2 years ago

The pull request is now available. https://github.com/ckan/ckanext-harvest/pull/484