Panfactum / stack

The Panfactum Stack
https://panfactum.com
Other
14 stars 5 forks source link

[question]: Getting decrypt permission error for production module not needed for development plan #172

Open wesbragagt opened 4 hours ago

wesbragagt commented 4 hours ago

Prior Search

What is your question?

I'm puzzled by this error, I'm attempting to make a wf_tf_plan workflow that I can run on pull-requests for targeted modules in my stack and when I run a plan for a single module in the development environment I get this error as if it needed a kms key from the production stack.

Release: edge.24-10-23

Error

00:47:22.420 ERROR  Error: Error in function call

00:47:22.420 ERROR    on ./environments/production/global/aws_account/terragrunt.hcl line 11, in locals:

00:47:22.420 ERROR    11:   secrets = yamldecode(sops_decrypt_file("${get_terragrunt_dir()}/secrets.yaml"))

00:47:22.420 ERROR  

00:47:22.420 ERROR  Call to function "sops_decrypt_file" failed: 1 error occurred:
        * error decrypting key: [error decrypting key
arn:aws:kms:us-west-2:590183845935:key/mrk-427637b8b90547b0ba77ff60d00c8858:
[failed to decrypt sops data key with AWS KMS: operation error KMS: Decrypt,
https response error StatusCode: 400, RequestID:
3ce7e2fd-0fe8-4497-afdd-4748655e93a0, api error AccessDeniedException: User:
arn:aws:sts::730335560480:assumed-role/plan-implentio-workflows-a2232943228f-20241115004610428300000003/ci-runner
is not authorized to perform: kms:Decrypt on the resource associated with this
ciphertext because the resource does not exist in this Region, no resource-based
policies allow access, or a resource-based policy explicitly denies access]
error decrypting key
arn:aws:kms:us-east-2:590183845935:key/mrk-427637b8b90547b0ba77ff60d00c8858:
[failed to decrypt sops data key with AWS KMS: operation error KMS: Decrypt,
https response error StatusCode: 400, RequestID:
393a26e2-3d2c-4ed4-ac1f-25d6bdb25b12, api error AccessDeniedException: User:
arn:aws:sts::730335560480:assumed-role/plan-implentio-workflows-a2232943228f-20241115004610428300000003/ci-runner
is not authorized to perform: kms:Decrypt on the resource associated with this
ciphertext because the resource does not exist in this Region, no resource-based
policies allow access, or a resource-based policy explicitly denies access]]

00:47:22.420 INFO   Shutting down Terragrunt Cache server...
00:47:22.420 INFO   Terragrunt Cache server stopped
00:47:22.421 ERROR  Error processing module at './environments/production/global/aws_account/terragrunt.hcl'. How this module was found: Terragrunt config file found in a subdirectory of ../repo. Underlying error: ./environments/production/global/aws_account/terragrunt.hcl:11,24-42: Error in function call; Call to function "sops_decrypt_file" failed: 1 error occurred:
        * error decrypting key: [error decrypting key arn:aws:kms:us-west-2:590183845935:key/mrk-427637b8b90547b0ba77ff60d00c8858: [failed to decrypt sops data key with AWS KMS: operation error KMS: Decrypt, https response error StatusCode: 400, RequestID: 3ce7e2fd-0fe8-4497-afdd-4748655e93a0, api error AccessDeniedException: User: arn:aws:sts::730335560480:assumed-role/plan-implentio-workflows-a2232943228f-20241115004610428300000003/ci-runner is not authorized to perform: kms:Decrypt on the resource associated with this ciphertext because the resource does not exist in this Region, no resource-based policies allow access, or a resource-based policy explicitly denies access] error decrypting key arn:aws:kms:us-east-2:590183845935:key/mrk-427637b8b90547b0ba77ff60d00c8858: [failed to decrypt sops data key with AWS KMS: operation error KMS: Decrypt, https response error StatusCode: 400, RequestID: 393a26e2-3d2c-4ed4-ac1f-25d6bdb25b12, api error AccessDeniedException: User: arn:aws:sts::730335560480:assumed-role/plan-implentio-workflows-a2232943228f-20241115004610428300000003/ci-runner is not authorized to perform: kms:Decrypt on the resource associated with this ciphertext because the resource does not exist in this Region, no resource-based policies allow access, or a resource-based policy explicitly denies access]]

.
00:47:22.421 ERROR  Unable to determine underlying exit code, so Terragrunt will exit with error code 1

Script running

#!/usr/bin/env bash

set -eo pipefail

#####################################################
# Step 1: Checkout the repo
#####################################################
cd /code
pf-wf-git-checkout \
  -r "$REPO" \
  -c "$GIT_REF" \
  -u "$GIT_USERNAME" \
  -p "$GIT_PASSWORD"

git config --global user.email "wbraga@implentio.com"
git config --global user.name "$GIT_USERNAME"
git config --global url.https://${GIT_PASSWORD}@github.com/.insteadOf https://github.com/

#####################################################
# Step 2: Setup AWS profile
#####################################################
cat >"$AWS_CONFIG_FILE" <<EOF
[profile ci]
role_arn = $AWS_ROLE_ARN
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
role_session_name = ci-runner
EOF

#####################################################
# Step 3: Setup the kubeconfig context
#####################################################
kubectl config set-cluster ci \
  --server="https://$KUBERNETES_SERVICE_HOST" \
  --certificate-authority /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --embed-certs
kubectl config set-credentials ci --token="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
kubectl config set-context ci --cluster=ci --user=ci --namespace=default

#####################################################
# Step 4: Setup vault
#####################################################
VAULT_TOKEN=$(vault write auth/kubernetes/login role="$VAULT_ROLE" jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" -format=json | jq -r '.auth.client_token')
export VAULT_TOKEN

#####################################################
# Step 5: Update sops-encrypted files so the runner can decrypt them
#####################################################
pf-sops-set-profile --directory . --profile ci

#####################################################
# Step 6: Use terragrunt to plan the IaC
#####################################################
cd /code/repo/
mkdir -p "$TF_PLUGIN_CACHE_DIR"
terragrunt run-all plan \
  --terragrunt-ignore-external-dependencies \
  --terragrunt-download-dir /tmp/.terragrunt \
  --terragrunt-non-interactive \
  --terragrunt-fetch-dependency-output-from-state \
  --terragrunt-provider-cache \
  --terragrunt-provider-cache-dir "$TF_PLUGIN_CACHE_DIR" \
  --terragrunt-parallelism 5 \
  --terragrunt-working-dir="$TF_APPLY_DIR"

What primary components of the stack does this relate to?

terraform

Code of Conduct

wesbragagt commented 4 hours ago
# wf_tf_plan/main.tf

terraform {
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.27.0"
    }
    kubectl = {
      source  = "alekc/kubectl"
      version = "2.0.4"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "5.70.0"
    }
    pf = {
      source  = "panfactum/pf"
      version = "0.0.3"
    }
  }
}

locals {
  # Use this as a consistent hostname so that force-unlock can work between
  # workflow runs
  hostname = md5("${var.repo}${var.tf_plan_dir}")

  entrypoint = "entry"
}

module "pull_through" {
  source                     = "${var.pf_module_source}aws_ecr_pull_through_cache_addresses${var.pf_module_ref}"
  pull_through_cache_enabled = var.pull_through_cache_enabled
}

module "constants" {
  source = "${var.pf_module_source}kube_constants${var.pf_module_ref}"
}

#############################################################
# AWS Permissions
#
# Should have full access to the AWS account as must be able to make
# arbitrary changes.
#############################################################

data "aws_iam_policy_document" "tf_plan_ecr" {
  statement {
    sid       = "CIAdmin"
    effect    = "Allow"
    actions   = ["*"]
    resources = ["*"]
  }
}

#############################################################
# Kubernetes Permissions
#
# Should have full access to Kubernetes as must be able to make
# arbitrary changes.
#############################################################

resource "kubernetes_cluster_role_binding" "tf_plan" {
  metadata {
    generate_name = var.name
    labels        = module.tf_plan_workflow.labels
  }
  subject {
    kind      = "ServiceAccount"
    name      = module.tf_plan_workflow.service_account_name
    namespace = var.namespace
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
}

#############################################################
# Vault Permissions
#
# Should have full access to Vault as must be able to make
# arbitrary changes.
#############################################################

data "vault_policy_document" "tf_plan_vault_permissions" {
  rule {
    path         = "*"
    capabilities = ["sudo", "create", "read", "update", "patch", "delete", "list"]
    description  = "allow all"
  }
}

module "tf_plan_vault_role" {
  source = "${var.pf_module_source}kube_sa_auth_vault${var.pf_module_ref}"

  service_account           = module.tf_plan_workflow.service_account_name
  service_account_namespace = var.namespace
  vault_policy_hcl          = data.vault_policy_document.tf_plan_vault_permissions.hcl
  token_ttl_seconds         = 60 * 60
}

#############################################################
# Workflow
#############################################################

# These define our workflow scripts
resource "kubernetes_config_map" "tf_plan_scripts" {
  metadata {
    name      = "${var.name}-scripts"
    labels    = module.tf_plan_workflow.labels
    namespace = var.namespace
  }
  data = {
    "plan.sh" = file("${path.module}/scripts/plan.sh")
  }
}

module "tf_plan_workflow" {
  source = "${var.pf_module_source}wf_spec${var.pf_module_ref}"

  name                    = var.name
  namespace               = var.namespace
  eks_cluster_name        = var.eks_cluster_name
  burstable_nodes_enabled = true
  active_deadline_seconds = 60 * 60

  # Typically not advised but required for terragrunt to operate
  read_only   = false
  run_as_root = true
  privileged  = true

  entrypoint = local.entrypoint
  passthrough_parameters = [
    {
      name        = "git_ref"
      description = "Which commit to check out and plan in the ${var.repo} repository"
      default     = var.git_ref
    },
    {
      name        = "tf_plan_dir"
      description = "Which directory to run 'terragrunt run-all plan' in inside the ${var.repo} repository"
      default     = var.tf_plan_dir
    }
  ]
  common_env = {
    REPO         = var.repo
    GIT_REF      = "{{inputs.parameters.git_ref}}"
    GIT_USERNAME = var.git_username
    TF_PLAN_DIR  = "{{inputs.parameters.tf_plan_dir}}"

    # Needed for Vault authentication
    VAULT_ROLE = module.tf_plan_vault_role.role_name
    VAULT_ADDR = "http://vault-active.vault.svc.cluster.local:8200"

    # Setup cache anc config directories
    TF_PLUGIN_CACHE_DIR   = "/tmp/.terraform"
    AWS_CONFIG_FILE       = "/.aws/config"
    KUBE_CONFIG_PATH      = "/.kube/config"
    KUBECONFIG            = "/.kube/config"
    HELM_REPOSITORY_CACHE = "/tmp/.helm"
    HELM_CACHE_HOME       = "/tmp/.helm"
    HELM_DATA_HOME        = "/tmp/.helm"

    CI = "true" # Required to run the Panfactum terragrunt setup in CI mode
  }
  common_secrets = merge(
    var.secrets,
    {
      GIT_PASSWORD = var.git_password
    }
  )
  extra_aws_permissions = data.aws_iam_policy_document.tf_plan_ecr.json
  default_resources = {
    requests = {
      memory = "${var.memory_mb}Mi"
      cpu    = "${var.cpu_millicores}m"
    }
    limits = {
      memory = "${var.memory_mb}Mi"
    }
  }
  default_container_image = "${module.pull_through.ecr_public_registry}/${module.constants.panfactum_image_repository}:${module.constants.panfactum_image_tag}"
  templates = [
    {
      name = local.entrypoint
      dag = {
        tasks = [
          {
            name     = "plan"
            template = "plan"
          }
        ]
      }
    },
    {
      name = "plan"
      podSpecPatch = yamlencode({
        hostname = local.hostname
      })
      volumes = module.tf_plan_workflow.volumes
      container = {
        command = ["/scripts/plan.sh"]
      }

      retryStrategy = { limit = "0" }
    }
  ]
  tmp_directories = {
    code = {
      mount_path = "/code"
      size_mb    = 2000
    }
    aws = {
      mount_path = "/.aws"
      size_mb    = 10
      node_local = true
    }
    kube = {
      mount_path = "/.kube"
      size_mb    = 10
      node_local = true
    }
    tmp = {
      mount_path = "/tmp"
      size_mb    = 3000
    }
    # # TODO: I do not think that terragrunt should be utilizing this directory
    # # but due to a bug it is so we must provide it. Revisit in a future release.
    # cache = {
    #   mount_path = "/.cache"
    #   size_mb    = 1000
    # }
  }
  config_map_mounts = {
    "${kubernetes_config_map.tf_plan_scripts.metadata[0].name}" = {
      mount_path = "/scripts"
    }
  }
}

resource "kubectl_manifest" "tf_plan_workflow_template" {
  yaml_body = yamlencode({
    apiVersion = "argoproj.io/v1alpha1"
    kind       = "WorkflowTemplate"
    metadata = {
      name      = var.name
      namespace = var.namespace
      labels    = module.tf_plan_workflow.labels
    }
    spec = module.tf_plan_workflow.workflow_spec
  })

  server_side_apply = true
  force_conflicts   = true
}