2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
108 stars 65 forks source link

[Spike: 2hr] Investigate EBS snapshots from AWS and any related terraform config for automated backups #5003

Closed sgibson91 closed 2 weeks ago

sgibson91 commented 1 month ago

Tasks

Definition of Done

sunu commented 2 weeks ago

I found two options to enable automated backups for EBS:

  1. AWS Data Lifecyle Manager (AWS DLM)
  2. AWS Backup Service

Looks like the first option, AWS DLM, is simpler and specifically designed for EBS volume backups. So I would suggest we go with that option.

Here's what the implementation will look like:

The IAM and lifecycle policy will look roughly like this:

resource "aws_iam_role" "dlm_lifecycle_role" {
  name = "dlm-lifecycle-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "dlm.amazonaws.com"
        }
      }
    ]
  })
}

# Attach required policy to the IAM role
resource "aws_iam_role_policy" "dlm_lifecycle" {
  name = "dlm-lifecycle-policy"
  role = aws_iam_role.dlm_lifecycle_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ec2:CreateSnapshot",
          "ec2:CreateSnapshots",
          "ec2:DeleteSnapshot",
          "ec2:DescribeVolumes",
          "ec2:DescribeInstances",
          "ec2:DescribeSnapshots"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "ec2:CreateTags"
        ]
        Resource = "arn:aws:ec2:*::snapshot/*"
      }
    ]
  })
}

# Create the DLM lifecycle policy
resource "aws_dlm_lifecycle_policy" "nfs_backup" {
  description        = "DLM lifecycle policy for NFS home directories backup"
  execution_role_arn = aws_iam_role.dlm_lifecycle_role.arn
  state             = "ENABLED"

  policy_details {
    resource_types = ["VOLUME"]

    schedule {
      name = "Daily backup"

      create_rule {
        interval      = 24
        interval_unit = "HOURS"
        times         = ["23:45"]
      }

      retain_rule {
        count = 7  # Keep last 7 daily backups
      }

      tags_to_add = {
        SnapshotCreator = "DLM"
        Purpose         = "NFS-Backup"
      }

      copy_tags = true
    }

    target_tags = {
      Backup = "true"  # Tag to identify volumes to backup
    }
  }
}

And then, the target tag (Backup = "true") needs to be applied to the existing EBS volume to enable automated backup.

sgibson91 commented 2 weeks ago

I have created https://github.com/2i2c-org/infrastructure/issues/5061 to track deploying this to VEDA staging