binlab / terraform-aws-vault-ha-raft

Hashicorp Vault HA cluster based on Raft Consensus Algorithm
https://github.com/binlab/terraform-aws-vault-ha-raft
MIT License
14 stars 6 forks source link
aws cluster hashicorp high-availability kms raft terraform vault

Hashicorp Vault HA cluster based on Raft Consensus Algorithm

License GitHub tag GitHub release Last Commit GitHub commit activity

languages Count Languages Top Code Size Repo Size

Vault Logo

Vault HA cluster is based on Raft Storage Backend announced tech preview on 1.2.0 (July 30th, 2019), introduced a beta on 1.3.0 (November 14th, 2019)) and promoted out of beta on 1.4.0 (April 7th, 2020)

The Raft storage backend is used to persist Vault's data. Unlike other storage backends, Raft storage does not operate from a single source of data. Instead all the nodes in a Vault cluster will have a replicated copy of Vault's data. Data gets replicated across the all the nodes via the Raft Consensus Algorithm.

  • High Availability – the Raft storage backend supports high availability.
  • HashiCorp Supported – the Raft storage backend is officially supported by HashiCorp.

Key features:

* looking here some limitations regarding AWS provisioning

Why?

Why not use a Kubernetes or other current cluster? For this, I can name a few reasons:

  1. Independence. For creating infrastructure as a code by Terraform (for example the same cluster) we need storage for storing secret input parameters (passwords, IPs, private data) and outputs (tokens, endpoints, passwords). For this very convenient to use a Vault.
  2. Stability. Cluster is an additional layer of abstraction across all EC2 instances. Much better using native methods of deploying.
  3. Security. Vault might be storing very secret and sensitive data. Putting this data together with publicly available services may carry potential risks of leaks. In this case, we can deploy a cluster totally independent even in a separate AWS account with access to which is limited to a few people.
  4. Lightweight. Sometimes we need a very lightweight and cheap Vault and at the same time very stable. E.g. just for auto-unseal another Vault.

IMPORTANT

AWS Permissions

For deploying you need a list of permissions. For beginners might be difficult to set up minimal need permissions, so here the list wildcard for main actions. For professional or those who interesting for high-level security and granular permissions looking this AWS IAM Granular Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VaultHAProvisioning",
      "Effect": "Allow",
      "Action": [
        "ec2:*",
        "DLM:*",
        "elasticloadbalancing:*",
        "iam:*",
        "kms:*",
        "route53:*",
        "sts:GetCallerIdentity"
      ],
      "Resource": "*"
    }
  ]
}

Usage

IMPORTANT: The last code from master might need to temporary enable the option nat_enabled (access to external resources) at the first initialization since, during the creation of the cluster, the instance needs to get a docker image. An alternative could be placing the cluster on a public subnet

The module can be deployed with almost default values of variables. For more details of the default values looking here

provider "aws" {
  region = "us-east-1"
}

module "vault" {
  source = "github.com/binlab/terraform-aws-vault-ha-raft?ref=v0.1.8"

  cluster_name       = "vault-ha"
  node_instance_type = "t3a.small"
  autounseal         = true
  nat_enabled        = true
}

output "cluster_url" {
  value = module.vault_ha.cluster_url
}

Then run:

$ terraform init
$ terraform apply

After deploying the process you should see:

...
cluster_url = http://tf-vault-ha-alb-123456789.us-east-1.elb.amazonaws.com:443
$

Then just open URL in a browser and initialize the cluster

ATTENTION! Some resources cannot be covered by Amazon Free Tier or not Free usage and cost a money so after running this example should destroy all resources created previously by next command:

$ terraform destroy

HOW TO

  1. Initializing newly created cluster
  2. Raft manual snapshots (Init/Join/Backup/Restore)
  3. AWS IAM Granular Permissions
  4. Change AMI on worked cluster

Examples

  1. Basic usage (Quick start)
  2. Public SSH access to the instances by OpenSSH private key
  3. Assigning CNAME and Route 53 Alias to Vault HA cluster
  4. Adding public certificate and domain by ACM and Route53
  5. Assigning module's VPC to external resources e.g. Bastion host
  6. VPC Peering different networks e.g. RDS Database
  7. Assigning Vault cluster to inside an already created (default) AWS VPC
  8. Assigning external AWS KMS Key for internal Auto Unseal
  9. Development and Debugging Sandbox

Troubleshooting

See separate page #troubleshooting

In case you encounter trouble that is not described in documentation or you cannot solve by your self please free to open an issue

TODO

* looking here some limitations regarding AWS provisioning

Limitations

Requirements

Name Version
terraform >= 0.12
aws >= 2.53.0
ignition <= 1.3.0
local >= 1.4.0
tls >= 2.1.1

Providers

Name Version
aws >= 2.53.0
ignition <= 1.3.0
local >= 1.4.0
tls >= 2.1.1

Modules

No modules.

Resources

Name Type
aws_dlm_lifecycle_policy.snapshots resource
aws_ebs_volume.data resource
aws_eip.nat resource
aws_iam_instance_profile.autounseal resource
aws_iam_role.autounseal resource
aws_iam_role.snapshots resource
aws_iam_role_policy.autounseal resource
aws_iam_role_policy.snapshots resource
aws_instance.node resource
aws_internet_gateway.public resource
aws_kms_key.autounseal resource
aws_lb.cluster resource
aws_lb_listener.cluster resource
aws_lb_target_group.cluster resource
aws_lb_target_group_attachment.cluster resource
aws_nat_gateway.private resource
aws_route.private resource
aws_route.public resource
aws_route53_record.alias resource
aws_route53_record.cname resource
aws_route53_record.int resource
aws_route53_zone.int resource
aws_route_table.private resource
aws_route_table.public resource
aws_route_table_association.private resource
aws_route_table_association.public resource
aws_security_group.alb resource
aws_security_group.node resource
aws_security_group_rule.alb_egress_allow_nodes resource
aws_security_group_rule.alb_ingress_allow_clients resource
aws_security_group_rule.alb_ingress_allow_nodes resource
aws_security_group_rule.node_egress_allow_all resource
aws_security_group_rule.node_ingress_allow_alb resource
aws_security_group_rule.node_ingress_allow_peer resource
aws_security_group_rule.node_ingress_allow_public_http resource
aws_security_group_rule.node_ingress_allow_public_ssh resource
aws_security_group_rule.node_ingress_allow_ssh resource
aws_subnet.private resource
aws_subnet.public resource
aws_volume_attachment.node resource
aws_vpc.this resource
local_file.ca_cert resource
local_file.config resource
local_file.node_cert resource
local_file.node_key resource
local_file.ssh_private_key resource
local_file.user_data resource
tls_cert_request.node resource
tls_locally_signed_cert.node resource
tls_private_key.ca resource
tls_private_key.core resource
tls_private_key.node resource
tls_self_signed_cert.ca resource
aws_ami.coreos data source
aws_ami.flatcar data source
aws_availability_zones.current data source
aws_iam_policy_document.autounseal data source
aws_iam_policy_document.autounseal_sts data source
aws_iam_policy_document.snapshots data source
aws_iam_policy_document.snapshots_sts data source
aws_route53_zone.external data source
ignition_config.node data source
ignition_file.auth_principals_admin data source
ignition_file.auth_principals_core data source
ignition_file.ca_ssh_public_keys data source
ignition_file.ca_tls_public_keys data source
ignition_file.config data source
ignition_file.helper data source
ignition_file.node_ca data source
ignition_file.node_cert data source
ignition_file.node_key data source
ignition_file.sshd_config data source
ignition_filesystem.data data source
ignition_systemd_unit.mount data source
ignition_systemd_unit.service data source
ignition_user.admin data source
ignition_user.core data source

Inputs

Name Description Type Default Required
ami_channel AMI filter for OS channel [stable/edge/beta/etc] string "stable" no
ami_image Specific AMI image ID in current Avalability Zone e.g. [ami-123456]
If provided nodes will be run on it, for cases when image built by
Packer if set it will disable search images by "ami_vendor" and
"ami_channel". Note: Instance OS should support CoreOS Ignition
provisioning. To change on worked cluster you need some trick, more
https://github.com/binlab/terraform-aws-vault-ha-raft/blob/master/docs/change-ami-on-worked-cluster.md
string "" no
ami_vendor AMI filter for OS vendor [coreos/flatcar] string "flatcar" no
autounseal Option to enable/disable creating KMS key, IAM role, policy and
AssumeRole for autounseal by AWS. Instead of creating by module,
can be used external resources for autounseal or without it at all.
If set will disable "seal_transit" and "seal_awskms". By variable
"kms_key_arn" can be configured external KMS Key instead of internal.
bool false no
aws_snapshots Option to enable/disable embedded snapshots by AWS bool false no
aws_snapshots_interval Snapshot Interval. How often this lifecycle policy
should be evaluated. 2,3,4,6,8,12 or 24 are valid values
number 24 no
aws_snapshots_retain How many snapshots to keep. Must be an integer between 1 and 1000 number 7 no
aws_snapshots_time A list of times in 24 hour clock format that sets when the
lifecycle policy should be evaluated. Max of 1 by UTC time
string "23:45" no
ca_ssh_public_keys List of SSH Certificate Authority public keys. Specifies a public
keys of certificate authorities that are trusted to sign
user certificates for authentication. More:
https://man.openbsd.org/sshd_config#TrustedUserCAKeys
list(string) [] no
ca_tls_public_keys List of custom Certificate Authority public keys. Used when need
to connect from Vault to resources with a self-signed certificate
list(string) [] no
certificate_arn ARN of AWS certificate for assigning to ALB to determine TLS
connection. It should be a certificate issued for a domain that
will be assigned as CNAME record to ALB endpoint. If not set TLS
not be activated on ALB. More:
https://www.terraform.io/docs/providers/aws/r/\
acm_certificate_validation.html#certificate_arn
string "" no
cluster_allowed_subnets Allowed IPs to connect to a cluster on ALB endpoint list(string)
[
"0.0.0.0/0"
]
no
cluster_count Count of nodes in cluster across all availability zones number 3 no
cluster_description Description for Tags in all resources.
Also used as a prefix for certificates "common_name",
"organizational_unit" and "organization" fields
string "Hashicorp Vault HA Cluster" no
cluster_domain Public cluster domain that will be assigned as CNAME record to
ALB endpoint. If not set ALB endpoint will be used
string "" no
cluster_name Name of a cluster, and tag "Name", can be a project name.
Format of "Name" tag "<cluster_prefix>-<cluster_name>-"
string "vault-ha" no
cluster_port External port on ALB endpoint to a public connection number 443 no
cluster_prefix Prefix of a tag "Name", can be a namespace.
Format of "Name" tag "<cluster_prefix>-<cluster_name>-"
string "tf-" no
data_volume_size Data (Raft) volume block device Size (GB) e.g. [8] number 8 no
data_volume_type Data (Raft) volume block device Type e.g. [gp2] string "gp2" no
debug Option for enabling debug output to plain files. When "true"
Terraform will store certificates, keys, ignitions files
(user data) JSON file to a folder "debug_path"
bool false no
debug_path Path to folder where will be stored debug files.
If is empty then default "${path.root}/.debug"
you can set custom full path e.g. "/home/user/.debug"
string "" no
disable_mlock Disables the server from executing the "mlock" syscall. Mlock
prevents memory from being swapped to disk. Disabling "mlock" is
not recommended in production, but is fine for local development
and testing
bool false no
docker_repo Vault Docker repository URI string "vault" no
docker_tag Vault Docker image version tag string "1.8.1" no
internal_zone Name for internal domain zone. Need for assigning domain names
to each of nodes for cluster server-to-server communication.
Also used for SSH connection over Bastion host.
string "vault.int" no
internet_gateway_id_external Provide existing external internet gateway ID for AWS VPC string null no
kms_key_arn ARN of an external AWS KMS Key. Is used for replacing internal
"aws_kms_key". Useful for cluster migration or more stable
configuration with independent KMS key outside of the module.
string null no
kms_key_create Determines is create or not create AWS KMS Key inside a module. If
the value is set "false", it will turn off the creation of internal
"aws_kms_key" resource.
bool true no
nat_enabled Determines to enable or disable creating NAT gateway and assigning
it to VPC Private Subnet. If you intend to use Vault only with
internal resources and internal network, you can disable this option
otherwise, you need to enable it. Allowing external routing might be
a potential security vulnerability. Also, enabling these options
will be additional money costs and not covered by the AWS Free Tier
program.
IMPORTANT: since during the creation of the cluster, the instance
needs to get a docker image, then it is necessary to enable
nat_enabled at the first initialization
bool false no
node_allow_public Assign public network to nodes (EC2 Instances). EC2 will be
available publicly with HTTPS "node_port" ports and SSH "ssh_port".
For debugging only, don't use on production!
bool false no
node_allowed_subnets If variable "node_allow_public" is set to "true" - list of these
IPs will be allowed to connect to Vault node directly (to instances)
list(string)
[
"0.0.0.0/32"
]
no
node_cert_hours_valid The number of hours after initial issuing that the certificate
will become invalid for Vault node. The certificate used for
internal communication in a cluster by peers and to connect from
ALB. Not recommended set a small value as there is no reissuance
mechanism without applying of the Terraform
number 43800 no
node_cpu_credits The credit option for CPU usage [unlimited/standard] string "standard" no
node_instance_type Type of instance e.g. [t3.small] string "t3.small" no
node_monitoring CloudWatch detailed monitoring [true/false] bool false no
node_name_tmpl Template of Vault node ID for a Raft cluster. Also used as a
subdomain prefix for internal domains for example:
"node0.vault.int", "node1.vault.int", etc
string "node%d" no
node_port Vault listens for ALB and health check requests number 8200 no
node_volume_size Node (Root) volume block device Size (GB) e.g. [8] number 8 no
node_volume_type Node (Root) volume block Device Type e.g. [gp2] string "gp2" no
peer_port Vault listens for server-to-server cluster requests number 8201 no
route53_record_create Determine a create Route53 record for cluster or not. If set true,
"route53_zone_id" also must be defined
bool false no
route53_record_name Name for subdomain Route53 record in a zone which determined in
"route53_zone_id"
string "vault" no
route53_record_ttl TTL for subdomain Route53 record in a zone which determined in
"route53_zone_id". Applies only if "route53_record_type" = "cname"
number 300 no
route53_record_type Type for subdomain Route53 record in a zone which determined in
"route53_zone_id". Can be "cname" or "alias", by default is - "cname"
string "cname" no
route53_zone_id External Route53 Zone ID for creating record inside a module, for
enabling need to set "route53_record_create" = true
string "" no
seal_awskms Map for an assignment for Vault to use AWS KMS as the seal
wrapping mechanism. If set will disable "seal_transit".
More: https://www.vaultproject.io/docs/configuration/seal/awskms
map(any) {} no
seal_transit Map for assignment Transit seal configuration for use Vault's
Transit Secret Engine as the autoseal mechanism.
More: https://www.vaultproject.io/docs/configuration/seal/transit
map(any) {} no
ssh_admin_principals List of SSH authorized principals for user "Core" when SSH login
configured via Certificate Authority ("ca_ssh_public_key" is set)
https://man.openbsd.org/sshd_config#AuthorizedPrincipalsFile
list(string)
[
"vault-ha"
]
no
ssh_allowed_subnets If variable "node_allow_public" is set to "true" - list of these
IPs will be allowed to connect to Vault node by SSH directly (to
instances)
list(string)
[
"0.0.0.0/32"
]
no
ssh_authorized_keys List of SSH authorized keys assigned to "Core" user (sudo user) list(string) [] no
ssh_core_principals List of SSH authorized principals for user "Admin" when SSH login
configured via Certificate Authority ("ca_ssh_public_key" is set)
More: https://man.openbsd.org/sshd_config#AuthorizedPrincipalsFile
list(string)
[
"sudo"
]
no
ssh_port Listening SSH port on instancies in public and private networks.
Changes used only when "ca_ssh_public_key" set otherwise it equal
to 22 as default
number 22 no
tags Map of tags assigned to each or created resources in AWS.
By default, used predefined described map in a file "locals.tf".
Each of them can be overwritten here separately.
map(string) {} no
vault_ui Enables the built-in Vault web UI bool true no
vpc_cidr VPC CIDR associated with a module. Block sizes must be between a
/16 netmask and /28 netmask for AWS. For example:
10.0.0.0/16-10.0.0.0/28,
172.16.0.0/16-172.16.0.0/28,
192.168.0.0/16-192.168.0.0/28
string "192.168.0.0/16" no
vpc_id_external Provide existing external AWS VPC ID. If so configure corresponding
vpc_public_subnet_cidr and vpc_private_subnet_cidr to match
external VPC CIDR
string null no
vpc_private_subnet_cidr CIDR block for private subnet, must be canonical form, be in the same
network with VPC and non-overlapping with other subnets. For example:
subnet /25, (e.g. 172.31.31.0/25) can contain up to 16 subnets
with a mask /28 (subnet mask must be not less than /28 for AWS)
string null no
vpc_private_subnet_mask Size of private subnet. The subnet mask must be not less than /28
for AWS. Mask /28 can contain up to 16 IP addresses but AWS reserved
5 addresses so 11 available for user. More:
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html
number 28 no
vpc_private_subnet_tmpl VPC Private Subnet Template. Created for convenient use for a person
who is quite not enough familiar with networks and subnetworks.
Each index from the list of availability zones will be replaced
accordingly instead of the placeholder %d. Will be ignored if
variable vpc_private_subnets defined.
DEPRICETED: Try to avoid use this configuration, might be removed
in next versions. In this case, to avoid re-creations of cluster,
just describe your exists networks by vpc_public_subnets
parameters list for example:
["192.168.101.0/24", "192.168.102.0/24", "192.168.103.0/24", ...]
string "192.168.10%d.0/24" no
vpc_private_subnets List of VPC Private Subnet. Each subnet will be assigned to
availability zone in order.
Mask must be not less than /28 for AWS. Subnets should not overlap
and should be in the same network with vpc_cidr
list(string) [] no
vpc_public_subnet_cidr CIDR block for public subnet, must be canonical form, be in the same
network with VPC and non-overlapping with other subnets. For example:
subnet /25, (e.g. 172.31.31.0/25) can contain up to 16 subnets
with a mask /28 (subnet mask must be not less than /28 for AWS)
string null no
vpc_public_subnet_mask Size of public subnet. The subnet mask must be not less than /28
for AWS. Mask /28 can contain up to 16 IP addresses but AWS reserved
5 addresses so 11 available for user. More:
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html
number 28 no
vpc_public_subnet_tmpl VPC Public Subnet Template. Created for convenient use for a person
who is quite not enough familiar with networks and subnetworks.
Each index from the list of availability zones will be replaced
accordingly instead of the placeholder %d. Will be ignored if
variable vpc_public_subnets defined.
DEPRICETED: Try to avoid use this configuration, might be removed
in next versions. In this case, to avoid re-creations of cluster,
just describe your exists networks by vpc_public_subnets
parameters list for example:
["192.168.1.0/24", "192.168.2.0/24", "192.168.3.0/24", ...]
string "192.168.%d.0/24" no
vpc_public_subnets List of VPC Public Subnets. Each subnet will be assigned to
availability zone in order.
Mask must be not less than /28 for AWS. Subnets should not overlap
and should be in the same network with vpc_cidr
list(string) [] no

Outputs

Name Description
alb_dns_name ALB external endpoint DNS name. Should use to assign
"CNAME" record of public domain
alb_zone_id ALB canonical hosted Zone ID of the load balancer.
Should use to assign Route 53 "Alias" record (AWS only).
cluster_url Cluster public URL with schema, domain, and port.
All parameters depend on inputs values and calculated automatically
for convenient use. Can be created separately outside a module
igw_public_ips List of Internet public IPs. If cluster nodes are determined to be
in the public subnet (Internet Gateway used) all external network
requests will be via public IPs assigned to the nodes. This list
can be used for configuring security groups of related services or
connect to the nodes via SSH on debugging
kms_key_arn ARN of AWS KMS Key. It can return ARN of internal created KMS key or
just forward ARN of an external key if it is provided by "kms_key_arn"
variable. It will return "null" if "autounseal=false" or "kms_key_arn"
not defined.
nat_public_ips NAT public IPs assigned as an external IP for requests from
each of the nodes. Convenient to use for restrict application,
audit logs, some security groups, or other IP-based security
policies. Note: if set "node_allow_public" each node will get
its own public IP which will be used for external requests.
If var.nat_enabled set to false returns an empty list.
node_security_group Node Security Group ID which allow connecting to "cluster_port",
"node_port" and "ssh_port". Useful for debugging when Bastion host
connected to the same VPC
private_subnets List of Private Subnet IDs created in a module and associated with it.
Under the hood is using "NAT Gateway" to external connections for the
"Route 0.0.0.0/0". When variable "node_allow_public" = false, this
network assigned to the instancies. For other cases, this useful to
assign another resource in this VPS for example Database which can
work behind a NAT (or without NAT at all and external connections
for security reasons) and not needs to be exposed publicly by own IP.
public_subnets List of Public Subnet IDs created in a module and associated with it.
Under the hood is using "Internet Gateway" to external connections
for the "Route 0.0.0.0/0". When variable "node_allow_public" = true,
this network assigned to the instancies. For other cases this useful
to assign another resource in this VPS for example Bastion host which
need to be exposed publicly by own IP and not behind a NAT.
route_table Route Table ID assigned to the current Vault HA cluster subnet.
Depends on which subnetwork assigned to instances Private or Public.
ssh_private_key SSH private key which generated by module and its public key
part assigned to each of nodes. Don't recommended do this as
a private key will be kept open and stored in a state file.
Instead of this set variable "ssh_authorized_keys". Please note,
if "ssh_authorized_keys" set "ssh_private_key" return empty output
vpc_id VPC ID created in a module and associated with it. Need to be exposed
for assigning other resources to the same VPC or for configuration a
peering connections. If configured vpc_id_external will return it