base2Services / shelvery-aws-backups

Automating EBS RDS EC2 backups on lambda
MIT License
36 stars 22 forks source link

Recommendations and best practices #111

Closed andresriancho closed 3 years ago

andresriancho commented 4 years ago

@Guslington first of all thanks for shelvery, we're starting to implement it at wildlifestudios.com and have some questions before investing more time into it:

Deployment

There seem to be two ways to deploy shelvery:

The SAM template seems to be more updated. It creates SQS and SNS. Is this the recommended way to deploy the tool?

Deployment II

We're going to be deploying this tool in multiple AWS accounts, with the objective of sharing and copying snapshots to a centralized AWS account.

Any tips to achieve this? As far for now what I see is that I'll need to write different configs for the source account and destination account, and create a script that will deploy these for every region and account. Is this the way to go?

Known issues

Reviewed the open and closed issues, the software seems solid. Is there something I should be aware of before implementing? Any strange stuff I'll find?

Thanks!

andresriancho commented 4 years ago

Also, would it make sense to change deploy-sam-template.sh to take a template parameter? That way it would be possible to specify different templates for source and destination accounts.

Guslington commented 4 years ago

Hi @andresriancho, apologies for the late reply. How did you go with rolling it out?

SAM is definitely the way to go, i left serverless deployment in there as an alternative. Please feel free to add a PR if you want to add some additional functionality into the deploy-sam-template.sh script.

The way the backing up to a central account works is having shelvery run in all the accounts with resources taking the scheduled backups and setting the ShelveryShareAwsAccountIds parameter to the value of the central aws account id. This shares the snapshots with that account. Then in your central account the PullSharedBackupsSchedule sets the cron schedule to copy the shared backups into the central account.

As for any known issues at the moment, the main one to keep an eye on is hitting aws limits. For example an aws account will only allow to copy 5 snapshots at a time between aws regions. shelvery will retry and there are sns topics for notifications to receive some of those errors.

Always happy to hear any feedback.

Thanks

toshke commented 4 years ago

@Guslington seems like limit of number snapshot per region has been increased to 20 per https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-ebs-increases-concurrent-snapshot-copy-limits-to-20-snapshots-per-destination-region/

Guslington commented 4 years ago

Excellent @toshke

andresriancho commented 4 years ago

This is crazy :-D Just yesterday we resumed this project and where wondering when you would answer :-)

Thanks for the reply, we're going to take these comments into consideration.

Something we were discussing yesterday was the possibility to modify shelvery in order to have it assumeRole into different accounts to perform the actions. That way the user would only have to deploy one lambda function. Thoughts?

Guslington commented 4 years ago

@andresriancho haha good timing! I definitely need better Github notifications :)

I did add support for assume role a while ago for that purpose in #57 but was never documented and i haven't used that functionality in a while so i can't guarantee its bug free.

We deploy Shelvery in each AWS account and region we require it in and manage the infrastructure through automated pipelines which works well but does make it a bit more complex when dealing with lots of AWS accounts.