department-of-veterans-affairs / abd-vro

To get Veterans benefits in minutes, VRO software uses health evidence data to help fast track disability claims.
Other
17 stars 6 forks source link

Draft a tech spec on secret management/sycnhronization across k8s and HashiCorp Vault #2241

Open msnwatson opened 7 months ago

msnwatson commented 7 months ago

Background Historically, we have used a GitHub action (https://github.com/department-of-veterans-affairs/abd-vro-internal/actions/workflows/deploy-secrets.yml) to synchronize secrets between HashiCorp vault and our LHDI K8s environments.

During a recent attempt to unbreak this GH action, it was found that there were several issues which caused corruption of secrets in LHDI environments, resulting in an overwhelming majority of pods being brought down in the dev environment. Moreover, the access token mechanism used for establishing a link between k8s and HashiCorp seems to have established a stricter expiration window for access tokens which would render the automation almost useless.

As we continue to prepare a secret's management solution, we should also look into what options are presented by LHDI. VRO's original integration with LHDI seems to have proceeded an officially recognized way of managing secrets, regardless of the reason, ArgoCD appears to be another option that we should be considering.

User Story

As a VRO engineer, I want to have a single source of truth for all of our secrets so that I can spend less time doing secret management and manually synchronizing secrets across different environments (e.g. k8s secrets, HashiCorp).

Acceptance Criteria

  1. Gain an understanding of the current automation and validate the hypothesis that it is no longer useful as an automation (understanding how long access tokens live, dependencies on CLI tools and versioning that are beyond our control)
  2. If hypothesis from 1 is validated, propose solution(s) to use only HashiCorp vault as a single source of secrets or automatically sync secrets between HashiCorp vault and K8s in as stable a way as possible (likely a tradeoff between reducing manual steps and brittleness of the solution here)
  3. If hypothesis from 1 is proven untrue, unbreak the solution and implement more failure control to make sure secrets are changed in the way we expect.
  4. The proposed solution will include a section that specifies LHDI's recommendations (note: we are not required to follow said recommendations, but we will expect to capture justification for any deviations - ArgoCD has the ability to interact with Vault

The following youtube video is a presentation from Hashicorp on their current recommendations for using Vault as a secret store on Kubernetes environments. In the first half of the video, Senior Product Manager David Yu presents the relevant features and potential integration points between Vault and K8s. He mentions some of the dilemmas developers will encounter when attempting to do this. David then turns the presentation over to Kyle Schochenmaier to talk through the various recommendations they have for resolving these dilemmas: https://youtube.com/watch?v=_zo8qn44Ulg&t=557

The solution that makes the most sense from my perspective is that we should use the Vault Agent Sidecar solution to populate a shared volume.

Here is the example they walk through in the video:

  1. Launch a Vault Agent Sidecar pod, and use the Vault Agent Injector Webhook to retrieve secrets from Vault:

  2. 65edd58b2d89d8f93545b65addf8c83a_MD5.jpeg
  3. Tell the Vault Agent Sidecar to create an init-container volume. This volume will be attached to any other pods that require secrets (i.e. mostly likely all the pods interacting with resources that require some type of auth or identity validation)

    b238ec848df7c9b9f79d309820c81f99_MD5.jpeg

Later, when it comes time to rotate secrets, this approach also includes the use of a Credential Rotation sidecar, which uses a TTL defined on the secrets to check for updates.

d2be63455992092f68485c0e9e0275e9_MD5.jpeg

For more information about this vault-agent-injector pattern, here is hashicorp's docs providing an example of a kubernetes environment accessing an external vault: https://developer.hashicorp.com/vault/tutorials/kubernetes/kubernetes-external-vault

And here is a walkthrough of setting up a vault on a local minikube: https://developer.hashicorp.com/vault/tutorials/kubernetes/kubernetes-minikube-raft

Why Vault? Why Go Through all this?

Because managing secrets is one of those things that can easily get out of hand, and become a real pain to manage in any complex microservice environment. Using Kubernetes secrets by themselves with no other management tools is simple enough on a case by case basis, but there are just too many manual steps and human error becomes extremely likely. Vault offers a low maintenance store and can itself act as a certificate authority. Let Vault handle creation, security, storage, and value rotation. Allowing Vault to manage your Kubernetes Secrets will save future headaches and uncertainty.

Another Possibility is the External Secrets Operator

Manage Kubernetes Secrets With External Secrets Operator:

#########
# Demo Script #
# Source: https://gist.github.com/ab6782dd6f865b3ffa913d9e1e578e1b #
#########

git clone https://github.com/vfarcic/external-secrets-demo

cd external-secrets-demo

helm repo add external-secrets \
    https://charts.external-secrets.io

helm repo update

helm upgrade --install \
    external-secrets \
    external-secrets/external-secrets \
    --namespace external-secrets \
    --create-namespace

# Replace `[...]` with the Google Cloud Project ID
export PROJECT_ID=dot-$(date +%Y%m%d%H%M%S)

gcloud projects create $PROJECT_ID

echo https://console.cloud.google.com/marketplace/product/google/secretmanager.googleapis.com?project=$PROJECT_ID

# Open the URL and *ENABLE* the API

gcloud iam service-accounts \
    --project $PROJECT_ID \
    create external-secrets

echo -ne '{
"name": "my-fancy-db",
"endpoint": "127.0.0.1:8200",
"username": "jdoe",
"password": "YouWillNeverFindOut",
"port": 8200
}' | gcloud secrets \
    --project $PROJECT_ID \
    create a-team-postgresql --data-file=-

gcloud secrets \
    --project $PROJECT_ID \
    add-iam-policy-binding a-team-postgresql \
    --member "serviceAccount:external-secrets@$PROJECT_ID.iam.gserviceaccount.com" \
    --role "roles/secretmanager.secretAccessor"

gcloud iam service-accounts \
    --project $PROJECT_ID \
    keys create account.json \
    --iam-account=external-secrets@$PROJECT_ID.iam.gserviceaccount.com

kubectl create namespace a-team

kubectl --namespace external-secrets \
    create secret generic gcp \
    --from-file=credentials=account.json

cat secret-store.yaml \
    | sed -e "s@projectID: .*@projectID: $PROJECT_ID@" \
    | tee secret-store.yaml

#############################
# Managing External Secrets #
#############################

echo https://console.cloud.google.com/security/secret-manager?project=$PROJECT_ID

cat secret-store.yaml

kubectl apply --filename secret-store.yaml

cat external-secret.yaml

kubectl --namespace a-team apply \
    --filename external-secret.yaml

kubectl --namespace a-team get secrets

kubectl --namespace a-team \
    get secret postgresql \
    --output yaml

kubectl --namespace a-team \
    get secret postgresql \
    --output jsonpath="{.data.password}" \
    | base64 --decode

kubectl --namespace a-team \
    get secret postgresql \
    --output jsonpath="{.data.password}" \
    | base64 --decode

# https://external-secrets.io > Providers

###########
# Destroy #
###########

gcloud projects delete $PROJECT_ID

GitOps - Encrypt Everything and Commit it to Git

In this final recommendation, we would abandon Vault and remove some of the learning curve required for using it. This approach would have us create a secrets repo that is only accessible to team members. Bitnami has some tools that enable this possibility. For this approach we would want to understand more about Bitnami Sealed Secrets and which we would use to perform the CA responsibilities and to handle the encryption. Once everything is encrypted, storage is no different than any other git repo.

meganhicks commented 3 months ago

waiting on @nelsestu to finish the ArgoCD tech spec and recommend an approach to resolving this issue

meganhicks commented 1 week ago

Mason can we dump this ticket- @mason