goauthentik / authentik

The authentication glue you need.
https://goauthentik.io
Other
7.85k stars 603 forks source link

startup probe failed: stat /dev/shm/authentik-worker: no such file or directory #9919

Open volker-raschek opened 1 month ago

volker-raschek commented 1 month ago

Describe the bug

The startup probe of the authentik-server and authentik-worker pods always returns an error, which causes kubernetes to reschedule the pods.

An excerpt of the authentik-worker logs.

{"event":"checking health","level":"debug","mode":"worker","timestamp":"2024-05-30T11:24:53Z"}
{"error":"stat /dev/shm/authentik-worker: no such file or directory","event":"failed to check worker heartbeat file","level":"warning","timestamp":"2024-05-30T11:24:53Z"}

Additionally throws the authentik-server the following error message:

{"event": "Loaded app settings", "level": "debug", "logger": "authentik.lib.config", "timestamp": 1717068978.7989888, "path": "authentik.events.settings"}
{"error":"authentik starting","event":"failed to proxy to backend","level":"warning","logger":"authentik.router","timestamp":"2024-05-30T11:36:19Z"}
{"error":"authentik starting","event":"failed to proxy to backend","level":"warning","logger":"authentik.router","timestamp":"2024-05-30T11:36:29Z"}
{"error":"Get \"http://localhost:8000/-/metrics/\": dial unix /dev/shm/authentik-core.sock: connect: no such file or directory","event":"failed to get upstream metrics","level":"warning","logger":"authentik.router.metrics","timestamp":"2024-05-30T11:36:34Z"}
{"error":"Get \"http://localhost:8000/-/metrics/\": dial unix /dev/shm/authentik-core.sock: connect: no such file or directory","event":"failed to get upstream metrics","level":"warning","logger":"authentik.router.metrics","timestamp":"2024-05-30T11:36:36Z"}
{"error":"authentik starting","event":"failed to proxy to backend","level":"warning","logger":"authentik.router","timestamp":"2024-05-30T11:36:39Z"}
/ak-root/venv/lib/python3.12/site-packages/opencontainers/distribution/reggie/defaults.py:17: SyntaxWarning: invalid escape sequence '\('
  "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
{"error":"Get \"http://localhost:8000/-/metrics/\": dial unix /dev/shm/authentik-core.sock: connect: no such file or directory","event":"failed to get upstream metrics","level":"warning","logger":"authentik.router.metrics","timestamp":"2024-05-30T11:36:40Z"}
{"domain_url": null, "event": "Loaded MMDB database", "file": "/geoip/GeoLite2-ASN.mmdb", "last_write": 1715093837.0, "level": "info", "logger": "authentik.events.context_processors.mmdb", "pid": 17, "schema_name": "public", "timestamp": "2024-05-30T11:36:44.611828"}
{"domain_url": null, "event": "Loaded MMDB database", "file": "/geoip/GeoLite2-City.mmdb", "last_write": 1715093836.0, "level": "info", "logger": "authentik.events.context_processors.mmdb", "pid": 17, "schema_name": "public", "timestamp": "2024-05-30T11:36:44.700453"}
{"error":"authentik starting","event":"failed to proxy to backend","level":"warning","logger":"authentik.router","timestamp":"2024-05-30T11:36:49Z"}
{"error":"authentik starting","event":"failed to proxy to backend","level":"warning","logger":"authentik.router","timestamp":"2024-05-30T11:36:59Z"}
{"error":"Get \"http://localhost:8000/-/metrics/\": dial unix /dev/shm/authentik-core.sock: connect: no such file or directory","event":"failed to get upstream metrics","level":"warning","logger":"authentik.router.metrics","timestamp":"2024-05-30T11:37:04Z"}
{"error":"Get \"http://localhost:8000/-/metrics/\": dial unix /dev/shm/authentik-core.sock: connect: no such file or directory","event":"failed to get upstream metrics","level":"warning","logger":"authentik.router.metrics","timestamp":"2024-05-30T11:37:06Z"}
{"error":"authentik starting","event":"failed to proxy to backend","level":"warning","logger":"authentik.router","timestamp":"2024-05-30T11:37:09Z"}
{"error":"Get \"http://localhost:8000/-/metrics/\": dial unix /dev/shm/authentik-core.sock: connect: no such file or directory","event":"failed to get upstream metrics","level":"warning","logger":"authentik.router.metrics","timestamp":"2024-05-30T11:37:10Z"}

Expected behavior

The startup probe does not return an error. Booth applications authentik-server and authentik-worker are healthy.

Version and Deployment (please complete the following information):

additional context

Authentik is deployed on aarch64 and x86_64 systems.

authentik-server

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: authentik
    meta.helm.sh/release-namespace: authentik
  creationTimestamp: "2024-05-18T11:28:35Z"
  generation: 4
  labels:
    app.kubernetes.io/component: server
    app.kubernetes.io/instance: authentik
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: authentik
    app.kubernetes.io/part-of: authentik
    app.kubernetes.io/version: 2024.4.2
    helm.sh/chart: authentik-2024.4.2
  name: authentik-server
  namespace: authentik
  resourceVersion: "81965339"
  uid: a290dbc6-957c-4d07-a501-99e20e95ba01
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app.kubernetes.io/component: server
      app.kubernetes.io/instance: authentik
      app.kubernetes.io/name: authentik
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum/secret: [marked]
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: server
        app.kubernetes.io/instance: authentik
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: authentik
        app.kubernetes.io/part-of: authentik
        app.kubernetes.io/version: 2024.4.2
        helm.sh/chart: authentik-2024.4.2
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: server
                  app.kubernetes.io/instance: authentik
                  app.kubernetes.io/name: authentik
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - server
        env:
        - name: AUTHENTIK_LISTEN__HTTP
          value: 0.0.0.0:9000
        - name: AUTHENTIK_LISTEN__HTTPS
          value: 0.0.0.0:9443
        - name: AUTHENTIK_LISTEN__METRICS
          value: 0.0.0.0:9300
        envFrom:
        - secretRef:
            name: authentik
        image: ghcr.io/goauthentik/server:2024.4.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/health/live/
            port: http
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: server
        ports:
        - containerPort: 9000
          name: http
          protocol: TCP
        - containerPort: 9443
          name: https
          protocol: TCP
        - containerPort: 9300
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/health/ready/
            port: http
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 150m
            memory: 512Mi
          requests:
            cpu: 150m
            memory: 512Mi
        startupProbe:
          failureThreshold: 60
          httpGet:
            path: /-/health/live/
            port: http
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      priorityClassName: authentik
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-05-18T11:28:35Z"
    lastUpdateTime: "2024-05-18T11:45:58Z"
    message: ReplicaSet "authentik-server-5795f56456" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2024-05-30T11:23:11Z"
    lastUpdateTime: "2024-05-30T11:23:11Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  observedGeneration: 4
  readyReplicas: 1
  replicas: 2
  unavailableReplicas: 1
  updatedReplicas: 2

authentik-worker

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: authentik
    meta.helm.sh/release-namespace: authentik
  creationTimestamp: "2024-05-18T11:28:35Z"
  generation: 4
  labels:
    app.kubernetes.io/component: worker
    app.kubernetes.io/instance: authentik
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: authentik
    app.kubernetes.io/part-of: authentik
    app.kubernetes.io/version: 2024.4.2
    helm.sh/chart: authentik-2024.4.2
  name: authentik-worker
  namespace: authentik
  resourceVersion: "81964479"
  uid: cb164273-b6ad-45b0-983a-e1187ad6f227
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app.kubernetes.io/component: worker
      app.kubernetes.io/instance: authentik
      app.kubernetes.io/name: authentik
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum/secret: [marked]
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: worker
        app.kubernetes.io/instance: authentik
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: authentik
        app.kubernetes.io/part-of: authentik
        app.kubernetes.io/version: 2024.4.2
        helm.sh/chart: authentik-2024.4.2
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: worker
                  app.kubernetes.io/instance: authentik
                  app.kubernetes.io/name: authentik
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - worker
        envFrom:
        - secretRef:
            name: authentik
        image: ghcr.io/goauthentik/server:2024.4.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - ak
            - healthcheck
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: worker
        readinessProbe:
          exec:
            command:
            - ak
            - healthcheck
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 150m
            memory: 512Mi
          requests:
            cpu: 150m
            memory: 512Mi
        startupProbe:
          exec:
            command:
            - ak
            - healthcheck
          failureThreshold: 2000
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      priorityClassName: authentik
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: authentik
      serviceAccountName: authentik
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: "2024-05-18T11:28:35Z"
    lastUpdateTime: "2024-05-18T11:39:07Z"
    message: ReplicaSet "authentik-worker-56c57f6cdd" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2024-05-30T11:35:41Z"
    lastUpdateTime: "2024-05-30T11:35:41Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 4
  readyReplicas: 2
  replicas: 2
  updatedReplicas: 2
BeeTwenty commented 3 weeks ago

I have the same