datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

[Epic] Purge datasets #123

Open zelima opened 6 years ago

zelima commented 6 years ago

Originally comming from here https://github.com/datahq/datahub-qa/issues/130

As a Publisher I want to permanently delete (purge) a data package so that it no longer takes up storage space.

Acceptance Criteria

Tasks

Analysis

The Web API should do the least amount of work so that the data appears deleted:

However, no data needs to be deleted in the API handler, just marked as deleted.

Later, a cron job that will run every hour/day/week will do the actual deletion of data from ES/S3.

Why?

Questions

Sounds much better, just have couple of questions [name=irakli]

Q: How exactly marking of files as deleted works?

Q: What about the revisions, how we can mark them as deleted - update all of them? If no they will be accessible in web, right?

Yes, public and unlisted pkgstore links will still work until S3 is deleted. I think that's fine. Private datasets shouldn't be accessible (as there's no way to get the private links). Thought: perhaps change to private before 'deleting'?

Q: What if user wants to push right after deletion. Will it be pushed as revision 1? Eg: I have 15 revision of garbage and the final one looks good. So I want to get rid of everything and re push one perfect revision.

Good question. I think that we can return an error in this case ('Dataset is marked for deletion and cannot be updated at this time... try again in once the dataset is fully purged.') }}