DoD-Platform-One / vault

https://repo1.dso.mil/big-bang/product/packages/vault
1 stars 0 forks source link

Prometheus pod stuck at init #2

Open p1-bot-repo-sync[bot] opened 3 weeks ago

p1-bot-repo-sync[bot] commented 3 weeks ago

More of a question than an issue. I've followed the instructions in the docs directory to get Vault deployed production ready. I'm running into an issue where Prometheus is stuck at init:0/3 It appears that the vault-agent-init pod is waiting for something, but I don't know what. I see three mounts required, but all three are empty directories. I'm guessing that I'm missing some sort of token to allow Prometheus to access Vault but don't know where to look. This is my config for Vault:

  vault:
    enabled: true
    sourceType: "helmRepo"
    ingress:
      gateway: "admin"
      key: |
        -----BEGIN PRIVATE KEY-----
        <...>
        -----END PRIVATE KEY-----
      cert: |
        -----BEGIN CERTIFICATE-----
        <...>
        -----END CERTIFICATE-----
    values:
      global:
        tlsDisable: false
      autoInit:
        enabled: false
      monitoring:
        enabled: true
      server:
        dataStorage:
          enabled: true
          size: 50Gi
          mountPath: "/vault/data"
          accessMode: ReadWriteOnce
        extraEnvironmentVars:
          VAULT_SKIP_VERIFY: "true"
          VAULT_LOG_FORMAT: "json"
        ha:
          enabled: true
          replicas: 3
          apiAddr: "https://vault.dsop-swf-admin.dee.ds.local"
          raft:
            enabled: true
            config:
              ui = true

              listener "tcp" {
                tls_disable = false
                address = "[::]:8200"
                cluster_address = "[::]:8201"
                tls_cert_file = "/vault/tls/tls.crt"
                tls_key_file  = "/vault/tls/tls.key"
                telemetry {
                unauthenticated_metrics_access = true
                }
              }

              storage "raft" {
                path = "/vault/data"

                retry_join {
                  leader_api_addr = "https://vault-vault-0.vault-vault-internal:8200"
                  leader_client_cert_file = "/vault/tls/tls.crt"
                  leader_client_key_file = "/vault/tls/tls.key"
                  leader_tls_servername = "vault.{{ values.domain }}"
                }

                retry_join {
                  leader_api_addr = "https://vault-vault-1.vault-vault-internal:8200"
                  leader_client_cert_file = "/vault/tls/tls.crt"
                  leader_client_key_file = "/vault/tls/tls.key"
                  leader_tls_servername = "vault.{{ values.domain }}"
                }

                retry_join {
                  leader_api_addr = "https://vault-vault-2.vault-vault-internal:8200"
                  leader_client_cert_file = "/vault/tls/tls.crt"
                  leader_client_key_file = "/vault/tls/tls.key"
                  leader_tls_servername = "vault.{{ values.domain }}"
                }

                seal "awskms" {
                  region = "us-gov-east-1"
                  kms_key_id = "<KMS_KEY>"
                  endpoint = "https://<ENDPOINT>.kms.us-gov-east-1.vpce.amazonaws.com"
                }

                telemetry {
                  unauthenticate_metrics_access = true
                  prometheus_retention_time = "24h"
                  disable_hostname = true
                }

                service_registration "kubernetes" {}

              setNodeId: true
            ingress:
              enabled: false
            resources:
              requests:
                cpu: "2"
                memory: 8Gi
              limits:
                cpu: "2"
                memory: 16Gi
              volumeMounts:
              - name: tls
                mountPath: "/vault/tls"
                readOnly: true
              - name: ca-bundle
                mountPath: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
                subPath: tls-ca-bundle.pem
              volumes:
              - name: tls
                secret:
                  secretName: vault-tls
              - name: ca-bundle
                configMap:
                  name: ca-bundle

The ca-bundle is needed to add our private CA certs into the container for authentication

p1-bot-repo-sync[bot] commented 3 weeks ago

justinguidry11 commented:

@ppryde @jmillage Further investigation was done but only succeeded in reproducing the same errors and solutions to them that Andrew had. If you can verify the list of items mentioned above still causes errors and provide logs we could continue trying to find a path forward on this.

p1-bot-repo-sync[bot] commented 1 week ago

akesterson commented:

We're pulling this out of our immediate set of priorities and will see if we can get someone to build a reliable reproducible test case. Until then this will remain lower priority.