Lossig sessions when deployed on GKE

KirioXX commented 3 years ago

First things first, thank you very much for this chart it made my life quite a lot easier, because I'm fairly new to Kubernetes and it helped a lot to understand everything a bit better.

My problem is that it seems like n8n is losing its session quite often with error messages on the client like:

The connection to https://n8n.xxx.xxx/rest/push?sessionId=0jrmptauh7e was interrupted while the page was loading.

and the server logs show:

The session "0jrmptauh7e" is not registered.

I'm now not sure if it's related to my Terraform/K8S setup or n8n it self. this is my Terraform config:

resource "helm_release" "n8n" {
  count           = 1
  depends_on      = [kubernetes_namespace.n8n, google_sql_database.n8n, google_sql_user.n8n]
  repository      = "https://8gears.container-registry.com/chartrepo/library"
  chart           = "n8n"
  version         = var.helm_version
  name            = var.release_name
  namespace       = var.namespace
  recreate_pods   = true
  values = [
    "${file("n8n_values.yaml")}"
  ]
  set_sensitive {
    name  = "n8n.encryption_key"
    value = var.n8n_encryption_key
  }
  set {
    name  = "config.database.postgresdb.host"
    value = data.terraform_remote_state.cluster.outputs.database_connection
  }
  set {
    name  = "config.database.postgresdb.user"
    value = var.db_username
  }
  set_sensitive {
    name  = "secret.database.postgresdb.password"
    value = var.db_password
  }
  set {
    name  = "config.security.basicAuth.user"
    value = var.username
  }
  set_sensitive {
    name  = "config.security.basicAuth.password"
    value = var.password
  }
}

resource "google_compute_managed_ssl_certificate" "n8n_ssl" {
  name = "${var.release_name}-ssl"
  managed {
    domains = ["n8n.xxx.xxx"]
  }
}

resource "kubernetes_ingress" "n8n_ingress" {
  depends_on = [google_compute_managed_ssl_certificate.n8n_ssl]
  metadata {
    name = "${var.release_name}-ingress"
    namespace = helm_release.n8n[0].namespace
    annotations = {
      "ingress.kubernetes.io/compress-enable"       = "false",
      "ingress.gcp.kubernetes.io/pre-shared-cert"   = google_compute_managed_ssl_certificate.n8n_ssl.name
    }
  }
  spec {
    backend {
      service_name = helm_release.n8n[0].name
      service_port = 80
    }
    rule {
      http {
        path {
          backend {
            service_name = helm_release.n8n[0].name
            service_port = 80
          }
        }
      }
    }
  }
}

and this is my valued file:

# The n8n related part of the config

config: # Dict with all n8n config options
  protocol: https
  port: 8080
  database:
    type: postgresdb
  security:
    basicAuth:
      active: true
secret: # Dict with all n8n config options, unlike config the values here will end up in a secret.
  database:
    postgresdb:
      password: ""
##
##
## Common Kubernetes Config Settings
persistence:
  ## If true, use a Persistent Volume Claim, If false, use emptyDir
  ##
  enabled: false
  type: emptyDir # what type volume, possible options are [existing, emptyDir, dynamic] dynamic for Dynamic Volume Provisioning, existing for using an existing Claim
  ## Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  # storageClass: "-"
  ## PVC annotations
  #
  # If you need this annotation include it under values.yml file and pvc.yml template will add it.
  # This is not maintained at Helm v3 anymore.
  # https://github.com/8gears/n8n-helm-chart/issues/8
  #
  # annotations:
  #   helm.sh/resource-policy: keep
  ## Persistent Volume Access Mode
  ##
  accessModes:
    - ReadWriteOnce
  ## Persistent Volume size
  ##
  size: 1Gi
  ## Use an existing PVC
  ##
  # existingClaim:

# Set additional environment variables on the Deployment
extraEnv:
  VUE_APP_URL_BASE_API: https://n8n.xxx.xxx/
  WEBHOOK_TUNNEL_URL: https://n8n.xxx.xxx/
# Set this if running behind a reverse proxy and the external port is different from the port n8n runs on
#   WEBHOOK_TUNNEL_URL: "https://n8n.myhost.com/

replicaCount: 1

image:
  repository: n8nio/n8n
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""

podAnnotations: {}

podSecurityContext: {}
# fsGroup: 2000

securityContext:
  {}
  # capabilities:
  #   drop:
  #   - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000

service:
  type: NodePort
  port: 80

ingress:
  enabled: false
  annotations: {}
  # kubernetes.io/ingress.class: nginx
  # kubernetes.io/tls-acme: "true"
  hosts: []
  # - host: chart-example.local
  #   paths: []
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

resources:
  {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
# requests:
#   cpu: 100m
#   memory: 128Mi

autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 100
  targetCPUUtilizationPercentage: 80
  # targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity: {}

Thank you in advance for any help. 🙌

KirioXX commented 3 years ago

It looks like that I found a solution. Every time the ingress closes the connection to the pod, n8n is dumping the session what causes the workflows to fail. I increased now the timeout to keep the connection longer open what seems to solve the problem. 🤞 that no one builds workflows that run for longer than 5 minutes.

Vad1mo commented 3 years ago

Can you Open a PR so we have decent initial timeout. I'll make an entry in the readme, mentioning that.

KirioXX commented 3 years ago

I'm not sure if the timeout fix is worth including in the helm chart, because I used a BackendConfig what is a GKE specific feature for their load balancer.

But I'll open a PR for the change I did to add annotations to the service what makes it possible to add the backend config. I would be also happy to help document the challenges I have had so far with GKE.

I still have some trouble with the workflows, somehow it can't initiate a trigger and just logs over and over that it initiates the trigger.

8gears / n8n-helm-chart

Lossig sessions when deployed on GKE #16