Backup improvements - Githubissues

stefansundin commented 5 years ago

Hi there. I have used parts of this project and modified it beyond recognition.

But I felt it necessary to contribute back some of my improvements. Mostly because of the pki issue I encountered and filed here: https://github.com/cablespaghetti/kubeadm-aws/issues/13

And also because the S3 storage costs can be lowered dramatically by compressing them first, and using versioning instead of timestamps in the filename to keep older backups. This way if the user wants to minimize costs, then versioning should be left disabled. There is no need to keep more than the latest etcd snapshot around. And no need to backup the pki data more than once.

And not to mention, the restoration code had a bug where an outdated snapshot would be restored if the backup interval was increased. That is a huge bug, although it won't be encountered by most people using this project. 1000 objects are returned by default from aws s3api list-objects, and about 700 objects would be created if when taking backups every 15 minutes and deleting them after 7 days. The most recent backup is at the very end of the list-objects API response, since it's always sorted alphabetically.

In my opinion, the backup-enabled variable should be removed, and then versioned-bucket can be used to save money. The cost of keeping a single backup around should be very low (especially now with compression). Self-healing should be a cornerstone of this project, and is impossible without a backup.

And please test these changes, since I copy-pasted them back from my changed version, and I haven't done a lot of testing on the backported version.

I have made some other improvements that you may want to incorporate, but I didn't include them in this PR:

Add a DNS name to the CA cert to allow it to be called from a remote computer more easily. Can be accomplished with:
```
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta1
apiServer:
 certSANs:
 - "${kubernetes_hostname}"
```
Use an EIP to have a consistent public IP to the master, and assign it to the master in the userdata script:
```
aws ec2 associate-address --instance-id $INSTANCE_ID --allocation-id ${eip_id}
```

Use S3 public access blocking to further secure the bucket, ensuring that public access is never accidentally granted to an object in it.

resource "aws_s3_bucket_public_access_block" "s3-bucket" {
 bucket                  = "${aws_s3_bucket.s3-bucket.id}"
 block_public_acls       = true
 block_public_policy     = true
 ignore_public_acls      = true
 restrict_public_buckets = true
}

Use more than a single subnet in order to maximize the chance of getting spot instances with low price. Although this could have the undesired effect of causing more cross-AZ traffic, has incurs additional costs, if workers are being used. Maybe the README should encourage the user to do some spot price research so they understand how the pricing works. Although the default instance type should always be cheap. I am using a much more powerful instance type.
This is overkill, but I am taking a daily snapshot as well, and archiving it straight to S3 Glacier Deep Archive, and keeping these snapshots for 180 days (minimum cost for Deep Archive). This way I can restore my cluster to any state in the past 180 days. This is mostly a fun experiment and it wouldn't be very useful for many, as restoring the backups out of Deep Archive takes a long time.

kurtmc commented 5 years ago

@stefansundin I think that I have run into the same issue as you regarding the flannel issue when restoring and I haven't been able to recover my cluster manually. I like this improvement you have made here and I am probably going to switch to your branch.

You mentioned that you have made some other improvements but have not incorporated it into this PR. Would you be keen to push those improvements to a branch on your repository so that I could have a look? I think that it would benefit a lot of people.

Thanks!

stefansundin commented 5 years ago

"Nice" to see that someone else had the same problem. I thought that I was doing something wrong, and I think that contributed to it taking so long for me to figure out the actual problem. I don't think you can recover your cluster, start over with a proper pki backup. :)

As for my other changes, I used this project for inspiration, and integrated similar Terraform code into an existing codebase that I have. So the code that I use has never actually been compatible with this project. I backported the important changes for the purposes of this PR, but the other enhancements that I made are described in the bullet list above.

I guess some other changes I've made are:

Use a bucket per region instead of per project. Instead of storing e.g. the pki data at s3://${s3bucket}/pki.tar.xz, I store it at s3://${s3bucket}/pki/${clustername}.tar.xz, and the same with the etcd backups.
Enable IPv6 in the VPC. To get a shorter IPv6 address (than the automatically assigned one), I also assign an IPv6 address in the userdata for the master.
In userdata, add the ephemeral device to /etc/fstab with a UUID parameter. I have not tested the m1.medium, but from what I can discern here is that it automatically mounts the ephemeral storage. On r5d, this does not appear to be the case, and I have to mount it manually. My code is not very portable, and looks for the device based on size, so I should improve this code and share it. On r5d and similar instances, the NVMe devices in /dev/ may change order when you reboot, so it is important to use UUID here.

There are also many things that I have not tested yet as well. For example, I am not using Helm yet. I am not using --cloud-provider=aws either, since so far I don't rely on EBS volumes for persistence (I will soon though). I am trying to start from scratch and use this project as a guide, in order to learn as much as possible. This project has been very useful and educational, so a big thanks to all the authors and contributors.

cablespaghetti / kubeadm-aws

Backup improvements #14