bank-vaults / vault-operator

Kubernetes operator for Hashicorp Vault
https://bank-vaults.dev/docs/operator/
Apache License 2.0
53 stars 21 forks source link

Vault broken when all nodes went offline like in case of power failure #480

Open Elyytscha opened 1 month ago

Elyytscha commented 1 month ago

Preflight Checklist

Operator Version

1.22.1

Installation Type

Official Helm chart

Bank-Vaults Version

No response

Kubernetes Version

1.28

Kubernetes Distribution/Provisioner

OKD

Expected Behavior

Vault should come back online successfull

Actual Behavior

Vault is broken and stays broken

Steps To Reproduce

  1. Provision a Vault with provided config
  2. Turn off all kubernetes compute nodes (Simulate a power failure)
  3. Turn nodes back on
  4. Vault broken

Configuration

apiVersion: vault.banzaicloud.com/v1alpha1
kind: Vault
metadata:
  name: vault
  namespace: openshift-bank-vault
  labels:
    app.kubernetes.io/name: vault
    vault_cr: vault
spec:
  existingTlsSecretName: selfsigned-cert-vault-tls
  veleroEnabled: false
  size: 3
  serviceMonitorEnabled: true
  unsealConfig:
    kubernetes:
      secretNamespace: openshift-bank-vault
  externalConfig:
    policies:
      - name: allow_secrets
        rules: path "secret/*" { capabilities = ["create", "read", "update", "delete", "list"] }
          path "auth/token/create" { capabilities = [ "update" ] }
    auth:
      - type: kubernetes
        roles:
          # Allow every pod in the default namespace to use the secret kv store
          - name: default
            bound_service_account_names: 
            - default
            - vault-mutating-webhook-vault-secrets-webhook 
            - vault
            bound_service_account_namespaces: 
              - "*"
            policies: 
              - allow_secrets
            ttl: 1h
          - name: secretsmutation
            bound_service_account_names:
              - vault-mutating-webhook-vault-secrets-webhook
              - default
            bound_service_account_namespaces:
              - openshift-bank-vault
            policies:
              - allow_secrets
            ttl: 1h
    secrets:
      - path: secret
        type: kv
        description: General secrets.
        options:
          version: 2
    # Allows writing some secrets to Vault (useful for development purposes).
    # See https://www.vaultproject.io/docs/secrets/kv/index.html for more information.
    startupSecrets:
      - type: kv
        path: secret/data/example/account
        data:
          data:
            USER: secretId
            PASS: s3cr3t
  ingress:
    annotations:
      #nginx.ingress.kubernetes.io/backend-protocol: HTTPS
      route.openshift.io/termination: reencrypt
      route.openshift.io/destination-ca-certificate-secret: selfsigned-cert-vault-tls
    spec:
      ingressClassName: openshift-default
      rules:
      - host: secrets.example.com
        http:
          paths:
          - backend:
              service:
                name: vault
                port:
                  number: 8200
            path: /
            pathType: Prefix
  # In some cases, you have to set permissions for the raft directory.
  # For example in the case of using a local kind cluster, uncomment the lines below.
  vaultInitContainers:
    - name: raft-permission
      image: busybox
      command:
        - /bin/sh
        - -c
        - |
          chown -R 100:1000 /vault/file
      volumeMounts:
        - name: vault-raft
          mountPath: /vault/file
  caNamespaces:
    - "*"
  image: hashicorp/vault:1.14.8

  # Vault Pods , Services and TLS Secret annotations
  vaultAnnotations:
    type/instance: vault

  # Vault Configurer Pods and Services annotations
  vaultConfigurerAnnotations:
    type/instance: vaultconfigurer

  # Specify the ServiceAccount where the Vault Pod and the Bank-Vaults configurer/unsealer is running
  serviceAccount: vault

  # Specify the Service's type where the Vault Service is exposed
  # Please note that some Ingress controllers like https://github.com/kubernetes/ingress-gce
  # forces you to expose your Service on a NodePort
  serviceType: ClusterIP

  # Use local disk to store Vault raft data, see config section.
  volumeClaimTemplates:
    - metadata:
        name: vault-raft
      spec:
        # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
        # storageClassName: ""
        accessModes:
          - ReadWriteOnce
        volumeMode: Filesystem
        resources:
          requests:
            storage: 1Gi

  config:
    storage:
      raft:
        path: "/vault/file"
    listener:
      tcp:
        address: "0.0.0.0:8200"
        tls_cert_file: /vault/tls/server.crt
        tls_key_file: /vault/tls/server.key
    api_addr: https://vault.openshift-bank-vault.svc:8200
    cluster_addr: "https://${.Env.POD_NAME}:8201"
    ui: true

  statsdDisabled: true

  serviceRegistrationEnabled: true

  resources:
    # A YAML representation of resource ResourceRequirements for vault container
    # Detail can reference: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
    vault:
      limits: {}
      requests:
        memory: "256Mi"
        cpu: "100m"

Logs

operator logs:
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"no watched namespace found, watching the entire cluster"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"registering manager checks"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"bootstrapping manager"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"starting manager"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8383","secure":false}
{"level":"info","ts":"2024-05-24T19:50:36Z","msg":"starting server","kind":"health probe","addr":"[::]:8080"}
I0524 19:50:36.204688       1 leaderelection.go:250] attempting to acquire leader lease openshift-bank-vault/vault-operator-lock...
I0524 19:50:53.462609       1 leaderelection.go:260] successfully acquired lease openshift-bank-vault/vault-operator-lock
{"level":"info","ts":"2024-05-24T19:50:53Z","msg":"Starting EventSource","controller":"vault-controller","source":"kind source: *v1alpha1.Vault"}
{"level":"info","ts":"2024-05-24T19:50:53Z","msg":"Starting Controller","controller":"vault-controller"}
{"level":"info","ts":"2024-05-24T19:50:53Z","msg":"Starting workers","controller":"vault-controller","worker count":1}
{"level":"info","ts":"2024-05-24T19:50:53Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:51:04Z","logger":"KubeAPIWarningLogger","msg":"would violate PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.capabilities.drop=[\"ALL\"]; container \"vault\" must not include \"IPC_LOCK\", \"SETFCAP\" in securityContext.capabilities.add), runAsNonRoot != true (pod or containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")"}
{"level":"info","ts":"2024-05-24T19:51:12Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:51:39Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:51:53Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:52:52Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:53:51Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:54:51Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:55:50Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:56:01Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:56:12Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}

Vault logs:
==> Vault server configuration:

Administrative Namespace: 
             Api Address: https://vault.openshift-bank-vault.svc:8200
                     Cgo: disabled
         Cluster Address: https://vault-0:8201
   Environment Variables: GODEBUG, GOTRACEBACK, HOME, HOSTNAME, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, NSS_SDB_USE_CACHE, PATH, PWD, SHLVL, TERM, VAULT_0_PORT, VAULT_0_PORT_8200_TCP, VAULT_0_PORT_8200_TCP_ADDR, VAULT_0_PORT_8200_TCP_PORT, VAULT_0_PORT_8200_TCP_PROTO, VAULT_0_PORT_8201_TCP, VAULT_0_PORT_8201_TCP_ADDR, VAULT_0_PORT_8201_TCP_PORT, VAULT_0_PORT_8201_TCP_PROTO, VAULT_0_PORT_9091_TCP, VAULT_0_PORT_9091_TCP_ADDR, VAULT_0_PORT_9091_TCP_PORT, VAULT_0_PORT_9091_TCP_PROTO, VAULT_0_SERVICE_HOST, VAULT_0_SERVICE_PORT, VAULT_0_SERVICE_PORT_API_PORT, VAULT_0_SERVICE_PORT_CLUSTER_PORT, VAULT_0_SERVICE_PORT_METRICS, VAULT_1_PORT, VAULT_1_PORT_8200_TCP, VAULT_1_PORT_8200_TCP_ADDR, VAULT_1_PORT_8200_TCP_PORT, VAULT_1_PORT_8200_TCP_PROTO, VAULT_1_PORT_8201_TCP, VAULT_1_PORT_8201_TCP_ADDR, VAULT_1_PORT_8201_TCP_PORT, VAULT_1_PORT_8201_TCP_PROTO, VAULT_1_PORT_9091_TCP, VAULT_1_PORT_9091_TCP_ADDR, VAULT_1_PORT_9091_TCP_PORT, VAULT_1_PORT_9091_TCP_PROTO, VAULT_1_SERVICE_HOST, VAULT_1_SERVICE_PORT, VAULT_1_SERVICE_PORT_API_PORT, VAULT_1_SERVICE_PORT_CLUSTER_PORT, VAULT_1_SERVICE_PORT_METRICS, VAULT_2_PORT, VAULT_2_PORT_8200_TCP, VAULT_2_PORT_8200_TCP_ADDR, VAULT_2_PORT_8200_TCP_PORT, VAULT_2_PORT_8200_TCP_PROTO, VAULT_2_PORT_8201_TCP, VAULT_2_PORT_8201_TCP_ADDR, VAULT_2_PORT_8201_TCP_PORT, VAULT_2_PORT_8201_TCP_PROTO, VAULT_2_PORT_9091_TCP, VAULT_2_PORT_9091_TCP_ADDR, VAULT_2_PORT_9091_TCP_PORT, VAULT_2_PORT_9091_TCP_PROTO, VAULT_2_SERVICE_HOST, VAULT_2_SERVICE_PORT, VAULT_2_SERVICE_PORT_API_PORT, VAULT_2_SERVICE_PORT_CLUSTER_PORT, VAULT_2_SERVICE_PORT_METRICS, VAULT_CONFIGURER_PORT, VAULT_CONFIGURER_PORT_9091_TCP, VAULT_CONFIGURER_PORT_9091_TCP_ADDR, VAULT_CONFIGURER_PORT_9091_TCP_PORT, VAULT_CONFIGURER_PORT_9091_TCP_PROTO, VAULT_CONFIGURER_SERVICE_HOST, VAULT_CONFIGURER_SERVICE_PORT, VAULT_CONFIGURER_SERVICE_PORT_METRICS, VAULT_K8S_POD_NAME, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP_ADDR, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP_PORT, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP_PROTO, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_SERVICE_HOST, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_SERVICE_PORT, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_SERVICE_PORT_VAULT_SECRETS_WEBHOOK, VAULT_OPERATOR_PORT, VAULT_OPERATOR_PORT_80_TCP, VAULT_OPERATOR_PORT_80_TCP_ADDR, VAULT_OPERATOR_PORT_80_TCP_PORT, VAULT_OPERATOR_PORT_80_TCP_PROTO, VAULT_OPERATOR_PORT_8383_TCP, VAULT_OPERATOR_PORT_8383_TCP_ADDR, VAULT_OPERATOR_PORT_8383_TCP_PORT, VAULT_OPERATOR_PORT_8383_TCP_PROTO, VAULT_OPERATOR_SERVICE_HOST, VAULT_OPERATOR_SERVICE_PORT, VAULT_OPERATOR_SERVICE_PORT_HTTP, VAULT_OPERATOR_SERVICE_PORT_HTTP_METRICS, VAULT_PORT, VAULT_PORT_8200_TCP, VAULT_PORT_8200_TCP_ADDR, VAULT_PORT_8200_TCP_PORT, VAULT_PORT_8200_TCP_PROTO, VAULT_PORT_8201_TCP, VAULT_PORT_8201_TCP_ADDR, VAULT_PORT_8201_TCP_PORT, VAULT_PORT_8201_TCP_PROTO, VAULT_PORT_9091_TCP, VAULT_PORT_9091_TCP_ADDR, VAULT_PORT_9091_TCP_PORT, VAULT_PORT_9091_TCP_PROTO, VAULT_PORT_9102_TCP, VAULT_PORT_9102_TCP_ADDR, VAULT_PORT_9102_TCP_PORT, VAULT_PORT_9102_TCP_PROTO, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_API_PORT, VAULT_SERVICE_PORT_CLUSTER_PORT, VAULT_SERVICE_PORT_METRICS, VAULT_SERVICE_PORT_STATSD, VERSION
              Go Version: go1.20.11
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: 
                   Mlock: supported: true, enabled: true
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.14.8, built 2023-12-04T17:45:23Z
             Version Sha: 446f213c47cabf47d52d065647ef666ce4bf8692

==> Vault server started! Log data will stream in below:

2024-05-24T19:55:13.932Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2024-05-24T19:55:14.300Z [INFO]  core: Initializing version history cache for core
2024-05-24T19:55:14.868Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:14.870Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:14.870Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:14.870Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:14.873Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:14.873Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:16.220Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:16.220Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:16.220Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:16.222Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:16.222Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:18.580Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:18.580Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:18.580Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:18.581Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:18.581Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:19.279Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:19.280Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:19.280Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:19.280Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:19.282Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:19.282Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:20.451Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:20.451Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:20.451Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:20.453Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:20.453Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:22.590Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:22.590Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:22.590Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:22.592Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:22.592Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:22.627Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:23.069Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:24.073Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:27.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:32.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:32.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:37.631Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.146Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.369Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.370Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.370Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.370Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:39.374Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:39.374Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:40.137Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:40.603Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:40.603Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:40.603Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:40.605Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:40.605Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:42.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:42.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:43.278Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:43.279Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:43.279Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:43.280Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:43.280Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:44.152Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:47.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:52.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:52.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:57.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:58.006Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:58.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:02.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:02.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:07.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:09.308Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.319Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.320Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.320Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.320Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:56:10.323Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:56:10.323Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:56:11.226Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:11.457Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:11.457Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:11.457Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:56:11.459Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:56:11.459Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:56:12.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:12.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:14.188Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:14.188Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:14.188Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:56:14.189Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:56:14.189Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:56:14.236Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:17.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:20.113Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:22.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:22.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:25.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:27.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:32.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:32.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:37.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:40.144Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:42.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:42.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:47.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:52.627Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:52.628Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:53.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:57.406Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:57.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:02.630Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:02.630Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.371Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.403Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.404Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.404Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.404Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:57:06.406Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:57:06.406Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:57:07.419Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:07.419Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:07.419Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:57:07.420Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:57:07.420Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:57:07.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:09.526Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:09.526Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:09.526Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:57:09.528Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:57:09.528Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:57:10.382Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:12.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:12.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:17.625Z [INFO]  core: security barrier not initialized

Additional Information

No response

Elyytscha commented 1 month ago

which seems not a problem related to bank vaults as far as I investigated:

i think its https://support.hashicorp.com/hc/en-us/articles/360050756393-How-to-recover-from-permanently-lost-quorum-while-using-Raft-integrated-storage-with-Vault

questions remaining are:

  1. Can we workaround / automate this with the vault operator?
  2. Is it intended to use velero to handle those situations?
Elyytscha commented 1 month ago

As I found out, this happens if unsealconfig is kubernetes

  unsealConfig:
    options:
      preFlightChecks: true
      storeRootToken: true
      secretShares: 5
      secretThreshold: 3
    kubernetes:
      secretNamespace: vault

with this config, vault does not survive an outage, if you kill all vault pods, vault doesn't come back up by itself

with another config, example, and the same storage backend (raft) vault does survive an outage of all vault nodes, comes back up online successfully without interaction

    google:
      kmsKeyRing: ${kms_keyring}
      kmsCryptoKey: ${kms_crypto_key}
      kmsLocation: ${region}
      kmsProject: ${project}
      storageBucket: ${storage_bucket}
  1. Why this work for google but not Kubernetes unsealConfig?
  2. How its intended to run bv operator operated vault in clusters where no cloud provider is used/available?