bank-vaults / bank-vaults

A Vault swiss-army knife: A CLI tool to init, unseal and configure Vault (auth methods, secret engines).
https://bank-vaults.dev
Apache License 2.0
2.04k stars 467 forks source link

[operator] Setting "servicePorts" doesn't propagate api-port name to pod probes #609

Closed chrisob closed 4 months ago

chrisob commented 5 years ago

Describe the bug: When setting servicePorts in a Vault CR, the operator is able to rename all containerPorts and service ports correctly, but Vault's pod probes still refer to a port named api-port.

For example, setting servicePorts to:

  servicePorts:
    http-api: 8200
    https-cluster: 8201
    http-metrics: 9091

...results in the following container spec:

    name: vault
    ports:
    - containerPort: 8200
      # containerPort correctly renamed
      name: http-api
      protocol: TCP
    - containerPort: 9091
      name: http-metrics
      protocol: TCP
    - containerPort: 8201
      name: https-cluster
      protocol: TCP
    readinessProbe:
      failureThreshold: 2
      httpGet:
        path: /v1/sys/health?standbyok&perfstandbyok
        # probe port hardcoded to api-port
        port: api-port
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/sys/init
        # probe port hardcoded to api-port
        port: api-port
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

Expected behaviour: I would expect that when the api-port name is changed, the port which the probes use is also modified to the new name.

Steps to reproduce the bug: See above.

Additional context: It seems that the api-port name is hardcoded here and here.

As for why one would want to rename the ports as such: Istio compatibility :) Istio requires that service ports are prefixed with the protocol, so it can gather telemetry data on traffic in the service mesh.

Environment details:

See above for config snippets

/kind bug

primeroz commented 5 years ago

some changes to port naming (between other things) are being worked on in https://github.com/banzaicloud/bank-vaults/pull/569

Do you mind reviewing that PR and see how it would affect your requirement for istio compatibility ?

Yeah at the moenth api-port and cluster-port are hardcoded

chrisob commented 5 years ago

Apologies, my understanding of golang isn't so great to have a full grasp of what that PR is doing :/

However, I do see that it dynamically sets the prober ports, so it may fix this issue. That being said, I haven't fully finished integrating vault-operator/secrets-webhook into my k8s stack, so there might be some other gotchas I run into.

FWIW, my usecase is terminating TLS at the ingress, then using Istio's mTLS to encrypt traffic between the ingress controller and Vault. Thus, Vault is configured to listen on plaintext HTTP, and in-cluster workloads will also see HTTP with Istio transparently encrypting traffic.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

supernomad commented 4 years ago

I just slammed into this issue, also due to istio compatibility. Just throwing out a plus 1 in terms of needing this done.

audrey-brightloom commented 4 years ago

ditto - running into this too

matyix commented 4 years ago

Apologies, this went a bit off the radar - we will be looking into the issue this week (part of the work has been done in #569).

bonifaido commented 4 years ago

Have you tried the changes in https://github.com/banzaicloud/bank-vaults/pull/768/files? This is already released in version 0.6.3 of the operator.

audrey-brightloom commented 4 years ago

whats the proper upgrade path here? i pulled the latest bank-vaults release and set istioEnabled: true in cr.yaml -

kubectl apply -f operator.yaml
customresourcedefinition.apiextensions.k8s.io/vaults.vault.banzaicloud.com unchanged
deployment.apps/vault-operator configured

kubectl apply -f cr.yaml
vault.vault.banzaicloud.com/vault configured
persistentvolumeclaim/vault-file unchanged

restarted the pod but still see -

    name: vault
    ports:
    - containerPort: 8200
      name: api-port
      protocol: TCP
    - containerPort: 8201
      name: cluster-port
      protocol: TCP

The service didn't seem to change either -

  ports:
  - name: api-port
    port: 8200
    protocol: TCP
    targetPort: 8200
  - name: cluster-port
    port: 8201
    protocol: TCP
    targetPort: 8201
audrey-brightloom commented 4 years ago

this could be why

{"level":"error","ts":1576177453.2391808,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"vault-controller","request":"default/vault","error":"failed to create/update StatefulSet: StatefulSet.apps \"vault\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden"
bonifaido commented 4 years ago

In this case please delete Vault statefulset with: k delete statefulsets.apps --cascade=false vault and after that, the operator will recreate it in a few seconds with the new settings (will cause a few seconds of Vault downtime in case of a non-HA Vault cluster, and zero seconds in an HA cluster).

audrey-brightloom commented 4 years ago

Bumping the operator version to 0.7.1 and applying the delete of the stateful set above produces this error now

"Reconciler error","controller":"vault-controller","request":"default/vault","error":"failed to create/update configurer deployment: Deployment.apps \"vault-configurer\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"vault-configurator\", \"vault_cr\":\"vault\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable"
bonifaido commented 4 years ago

Do you see this error continuously or only once (which is okay if you see it once)?

bonifaido commented 4 years ago

If you have deleted the stateful set again you shouldn't see this error again, only in case you change the servicePorts again.

audrey-brightloom commented 4 years ago

its still happening every 30 seconds. i did delete the stateful set and it did rename the ports as expected.

bonifaido commented 4 years ago

How does your Vault CR look like in this case?

audrey-brightloom commented 4 years ago
spec:
  annotations:
    common/annotation: "true"
  bankVaultsImage: banzaicloud/bank-vaults:0.7.1
  caNamespaces:
  - vswh
  config:
    listener:
      tcp:
        address: 0.0.0.0:8200
        tls_cert_file: /vault/tls/server.crt
        tls_key_file: /vault/tls/server.key
    storage:
      file:
        path: ${ .Env.VAULT_STORAGE_FILE }
    telemetry:
      statsd_address: localhost:9125
    ui: true
  credentialsConfig:
    env: ""
    path: ""
    secretName: ""
  envsConfig: null
  etcdSize: 0
  etcdVersion: ""
  externalConfig:
    auth:
    - roles:
      - bound_service_account_names:
        - default
        - vault-secrets-webhook
        - vault
        bound_service_account_namespaces:
        - default
        - vswh
        - command
        name: default
        policies:
        - allow_secrets
        - allow_pki
        ttl: 1h
      type: kubernetes
    policies:
    - name: allow_secrets
      rules: path "secret/*" { capabilities = ["create", "read", "update", "delete",
        "list"] }
    - name: allow_pki
      rules: path "pki/*" { capabilities = ["create", "read", "update", "delete",
        "list"] }
    secrets:
    - description: General secrets.
      options:
        version: 2
      path: secret
      type: kv
    - config:
        default_lease_ttl: 168h
        max_lease_ttl: 720h
      configuration:
        config:
        - crl_distribution_points: https://vault.default:8200/v1/pki/crl
          issuing_certificates: https://vault.default:8200/v1/pki/ca
          name: urls
        roles:
        - allow_subdomains: true
          allowed_domains: localhost,pod,svc,default
          generate_lease: true
          name: default
          ttl: 1m
        root/generate:
        - common_name: vault.default
          name: internal
      description: Vault PKI Backend
      type: pki
  fluentdConfig: ""
  fluentdEnabled: false
  fluentdImage: ""
  image: vault:1.2.3
  ingress:
    spec: {}
  istioEnabled: true
  nodeAffinity: {}
  nodeSelector: null
  podAntiAffinity: ""
  securityContext: {}
  serviceAccount: vault
  servicePorts: null
  serviceType: ClusterIP
  size: 1
  statsdDisabled: false
  statsdImage: ""
  tolerations: null
  unsealConfig:
    kubernetes:
      secretName: ""
      secretNamespace: default
    options:
      preFlightChecks: true
  vaultAnnotations:
    type/instance: vault
  vaultConfigurerAnnotations:
    type/instance: vaultconfigurer
  vaultConfigurerLabels:
    example.com/log-format: string
  vaultConfigurerPodSpec:
    containers: null
  vaultEnvsConfig:
  - name: VAULT_LOG_LEVEL
    value: debug
  - name: VAULT_STORAGE_FILE
    value: /vault/file
  vaultLabels:
    example.com/log-format: json
  vaultPodSpec:
    containers: null
  volumeMounts:
  - mountPath: /vault/file
    name: vault-file
  volumes:
  - name: vault-file
    persistentVolumeClaim:
      claimName: vault-file
  watchedSecretsLabels: null
status:
  leader: vault-0
  nodes:
  - vault-0
bonifaido commented 4 years ago

Oh I see, please delete also the vault-configurer deployment, the operator will recreate it with the right port naming and you should be okay (the error message should be gone). I hope this helps.