Frontend returns 404 after running a few days

TapTap21 commented 1 month ago

Describe the bug

I run unleash in EKS with ALB ingress to the frontend. After it's been running a few days, the frontend returns 404 and simply loads a grey screen. Restarting the pods fixes it immediately. The healthcheck never fails during this issue.

Steps to reproduce the bug

Run unleash in kubernetes with ALB ingress
Wait some time
Returns 404
Restart pod
Works

Expected behavior

It does not return 404

Logs, error output, etc.

I don't see anything weird in the logs

Screenshots

Additional context

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    reloader.stakater.com/auto: 'true'
  name: unleash
  namespace: unleash
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: unleash-prod
      app.kubernetes.io/name: unleash
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        admission.datadoghq.com/enabled: 'true'
        app.kubernetes.io/instance: unleash-prod
        app.kubernetes.io/name: unleash
    spec:
      containers:
        - envFrom:
            - secretRef:
                name: unleash
            - configMapRef:
                name: unleash
          image: 'unleashorg/unleash-server:latest'
          imagePullPolicy: IfNotPresent
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - sleep 60
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 10
            periodSeconds: 5
          name: unleash
          ports:
            - containerPort: 4242
              name: http
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 1
            periodSeconds: 2
          resources:
            limits:
              memory: 256Mi
            requests:
              cpu: 125m
              memory: 256Mi
      nodeSelector:
        provisioner: karpenter
      restartPolicy: Always
      terminationGracePeriodSeconds: 90
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/instance: unleash-prod
              app.kubernetes.io/name: unleash
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
        - labelSelector:
            matchLabels:
              app.kubernetes.io/instance: unleash-prod
              app.kubernetes.io/name: unleash
          maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule

Unleash version

6.0.0

Subscription type

None

Hosting type

Self-hosted

SDK information (language and version)

No response

chriswk commented 1 month ago

Hi. Two things I've spotted, one is that you're using the latest tag and IfNotPresent, which means you have no real control over what version you're running in your cluster. I recommend to change to actual released tags; these are stable.

However, how eks loses the index file I do not know. Almost like the static hosting is failing. We've not seen this before.

TapTap21 commented 1 month ago

Thanks for the swift response!

I've had this issue for a while now, I changed to the latest tag hoping that it would fix the issue and there was a bug I wasn't aware about. I've changed it to 6.0.0 now.

I'm not sure what's happening here, as I host many other apps, frontend and backend, in this cluster with no such issue. Since this file is part of the container and not a in a hosted volume, I would assume that the issue is in the container.

I have 2 deployments of Unleash running, one in dev and prod. Are there any settings, configs I can set in dev to help and find the cause of this issue?

TapTap21 commented 1 month ago

Since this "bug" takes a few days to manifest, I'll keep a close eye on it and report back if setting to 6.0.0 helped.

chriswk commented 1 month ago

Thank you. We haven't seen this in our EKS clusters, so if you could print out the folder structure if it happens again, that would at least let us see if it's the hosting in express or the file system that's breaking.

Unleash / unleash