Open jbrown-xentity opened 4 years ago
Let's make sure there's an issue opened upstream for this. I don't see a reason why CKAN wouldn't want the purge action to be more robust.
Good call. Made comment on upstream ticket.
I updated the ticket with a sketch. I don't think we want to use jobs here. I think that's more for the web process handing off long running jobs. This task is really about a scheduled task to clean out deleted datasets each day.
FYI, CKAN 2.3.5 also has the purge command in CLI.
User Story
In order to free up resources from deleted datasets, data.gov team members want a regularly scheduled purge job to remove deleted resources.
Acceptance Criteria
Background
The purge functionality as an administrator is not working, due to the long nature of the command timing out in gunicorn. A command line option exists to purge a specific dataset, ~but the ckan jobs functionality added in 2.7 will probably need to be utilized to implement this appropriately.~ as well as a CKAN action.
Security Considerations (required)
~Any data removal should first be confirmed by the data managers/owners, as in the department of education case~.
SSP should be updated that deleted data is kept for X days (based on our configuration).
Sketch
Given that the purge command/action already exists on a per-dataset basis, we should use that. This is atomic and will ensure consistency.
The implementation could look like this:
If the script crashes or times out, it will pick up where it left off.
I think to be shared between Catalog and Inventory, this should go into a new extension ckanext-maintenance which is not data.gov specific. Any other maintenance jobs can go there as well.