Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
78 stars 26 forks source link

Add some kind of protection for accidental deletion of KG-COVID-19 builds #385

Closed justaddcoffee closed 3 years ago

justaddcoffee commented 3 years ago

Right now we could in theory nuke existing KG-COVID-19 builds with an errant s3 command, which isn't ideal.

Describe the desired behavior

It'd be desirable to have some options if KG-COVID-19 builds were nuked. Some options:

Additional context

justaddcoffee commented 3 years ago

@kltm I was thinking for now to address this ticket we could just turn on versioning of that S3 bucket - cost is minimal IIUC

kltm commented 3 years ago

@justaddcoffee Does S3 versioning actually prevent deletion? If the object is gone, isn't it gone? Also, if one is thinking "backup", more S3 doesn't necessarily cover redundancy. If one is thinking permissions, more buckets with different access.

justaddcoffee commented 3 years ago

Does S3 versioning actually prevent deletion? If the object is gone, isn't it gone?

@kltm nope, if I understand the docs correctly, it does actually prevent deletion:

If you delete an object, instead of removing it permanently, Amazon S3 inserts a delete marker, which becomes the current object version. You can always restore the previous version

justaddcoffee commented 3 years ago

Also, if one is thinking "backup", more S3 doesn't necessarily cover redundancy. If one is thinking permissions, more buckets with different access.

This is true - I'm avoiding the word backups when referring to S3 versioning, as this isn't really a backup

kltm commented 3 years ago

@justaddcoffee Fair enough on both points. It might be worth actually looking at how a "recovery" would work in practice and have basic tooling for it first (from painful memories of "theoretical" safety systems in the past). I guess I think that this is sort of a neither-this-nor-that solution--not a backup and not a permissions system. It does have the advantage of being trivial to implement (although the costs may accumulate unless you have additional operations in the autiomated lifecycle/), but I might check out exactly how it would work in practice first.

justaddcoffee commented 3 years ago

Okay, I've enabled versioning for the kg-hub-public-data bucket, and tested that I can actually delete something, then undeleted it (or more accurately, delete the 'DeleteMarker' in AWS parlance).

This doc is really helpful: https://aws.amazon.com/premiumsupport/knowledge-center/s3-undelete-configuration/