getredash / contrib-helm-chart

Community maintained Redash Helm Chart
Apache License 2.0
163 stars 144 forks source link

Redash won't start up: connecting to redash-redis-master:6379. Connection refused & WORKER TIMEOUT #127

Open Rusiecki opened 2 years ago

Rusiecki commented 2 years ago

Hello, I am trying to install redash with the helm chart via Terraform on our corporate cluster. I've read that the inital deployment takes a bit and also let it run for over a few hours but still no luck.

Locally my setup runs fine. Only on the corporate cluster it fails somehow.

1.) I am suspecting that there is somekind of proxy-hick-up that is why I disabled most of the query_runner & destination this at least reduced that amound of WORKER TIMEOUT inside redash itself. But finally didn't solve my problem.

2.) What also is kind of weird is that I get an redash-redis-master connection refused. altough it is reachable within the k8s network.

I'm happy for any help.

What I've already tried:

  1. Deactivate Liveness/Readyness Probe: Pods are green but nothing is reachable ending up, that the setup is dead.
  2. Giving the setup a http_proxy/https_proxy/no_proxy via the env

Here are the logs of the pods if it helps:

Logs(dev/redash-adhocworker-7b59b97db6-ldggj:redash-adhocworker)[1m]

Using Database: postgresql://redash:******@redash-postgresql:5432/redash
Using Redis: redis://:******@redash-redis-master:6379/0
Starting 2 workers for queues: queries...
[2022-06-30 08:39:29,473][PID:6][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:39:29,473][PID:6][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:39:29,474][PID:6][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:39:29,475][PID:6][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:39:29,682][PID:6][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7fe971def460>
[2022-06-30 08:39:29,683][PID:6][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:39:29,683][PID:6][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>

 -------------- celery@redash-adhocworker-7b59b97db6-ldggj v4.3.0 (rhubarb)
---- **** ----- 
--- * ***  * -- Linux-5.15.37-051537-generic-x86_64-with-debian-10.0 2022-06-30 08:39:30
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         redash:0x7fe96dc78e50
- ** ---------- .> transport:   redis://:**@redash-redis-master:6379/0
- ** ---------- .> results:     redis://:**@redash-redis-master:6379/0
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> queries          exchange=queries(direct) key=queries

[tasks]
  . redash.tasks.check_alerts_for_query
  . redash.tasks.cleanup_query_results
  . redash.tasks.empty_schedules
  . redash.tasks.execute_query
  . redash.tasks.record_event
  . redash.tasks.refresh_queries
  . redash.tasks.refresh_schema
  . redash.tasks.refresh_schemas
  . redash.tasks.send_aggregated_errors
  . redash.tasks.send_mail
  . redash.tasks.subscribe
  . redash.tasks.sync_user_details
  . redash.tasks.version_check

[2022-06-30 08:39:32,086][PID:6][ERROR][MainProcess] consumer: Cannot connect to redis://:**@redash-redis-master:6379/0: Error 111 connecting to redash-redis-master:6379. Connection refused..
Trying again in 2.00 seconds...

[2022-06-30 08:39:35,093][PID:6][ERROR][MainProcess] consumer: Cannot connect to redis://:**@redash-redis-master:6379/0: Error 111 connecting to redash-redis-master:6379. Connection refused..
Trying again in 4.00 seconds...

Logs(dev/redash-c5bbb5cc4-d9gj9:redash-server)[1m]


Using Database: postgresql://redash:******@redash-postgresql:5432/redash
Using Redis: redis://:******@redash-redis-master:6379/0
[2022-06-30 08:39:28 +0000] [6] [INFO] Starting gunicorn 19.7.1
[2022-06-30 08:39:28 +0000] [6] [INFO] Listening at: http://0.0.0.0:5000 (6)
[2022-06-30 08:39:28 +0000] [6] [INFO] Using worker: sync
[2022-06-30 08:39:28 +0000] [10] [INFO] Booting worker with pid: 10
[2022-06-30 08:39:28 +0000] [12] [INFO] Booting worker with pid: 12
[2022-06-30 08:39:28 +0000] [14] [INFO] Booting worker with pid: 14
[2022-06-30 08:39:28 +0000] [16] [INFO] Booting worker with pid: 16
[2022-06-30 08:39:29,387][PID:10][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:39:29,387][PID:10][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:39:29,388][PID:10][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:39:29,390][PID:10][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:39:29,394][PID:12][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:39:29,395][PID:12][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:39:29,395][PID:12][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:39:29,395][PID:12][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:39:29,424][PID:16][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:39:29,424][PID:16][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:39:29,424][PID:16][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:39:29,424][PID:16][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:39:29,461][PID:16][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7f56f5af5600>
[2022-06-30 08:39:29,461][PID:16][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:39:29,461][PID:16][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>
[2022-06-30 08:39:29,465][PID:10][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7f56f5af4600>
[2022-06-30 08:39:29,465][PID:10][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:39:29,466][PID:10][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>
[2022-06-30 08:39:29,482][PID:14][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:39:29,482][PID:14][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:39:29,482][PID:14][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:39:29,482][PID:14][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:39:29,495][PID:12][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7f56f5af4600>
[2022-06-30 08:39:29,496][PID:12][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:39:29,496][PID:12][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>
[2022-06-30 08:39:29,535][PID:14][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7f56f5af5600>
[2022-06-30 08:39:29,535][PID:14][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:39:29,536][PID:14][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>
[2022-06-30 08:40:16 +0000] [6] [CRITICAL] WORKER TIMEOUT (pid:12)
[2022-06-30 08:40:16 +0000] [12] [INFO] Worker exiting (pid: 12)
[2022-06-30 08:40:16 +0000] [22] [INFO] Booting worker with pid: 22
[2022-06-30 08:40:17,368][PID:22][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:40:17,369][PID:22][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:40:17,369][PID:22][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:40:17,369][PID:22][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:40:17,399][PID:22][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7f56f5af5600>
[2022-06-30 08:40:17,399][PID:22][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:40:17,399][PID:22][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>
[2022-06-30 08:40:25 +0000] [6] [CRITICAL] WORKER TIMEOUT (pid:16)
[2022-06-30 08:40:25 +0000] [16] [INFO] Worker exiting (pid: 16)
[2022-06-30 08:40:26 +0000] [25] [INFO] Booting worker with pid: 25

Logs(dev/redash-genericworker-65bb9df79d-tt8kw:redash-genericworker)[1m]

Using Redis: redis://:******@redash-redis-master:6379/0
Starting 1 workers for queues: periodic,emails,default...
[2022-06-30 08:39:29,513][PID:6][DEBUG][redash.query_runner] Registering PostgreSQL (pg) query runner.
[2022-06-30 08:39:29,513][PID:6][DEBUG][redash.query_runner] Registering Redshift (redshift) query runner.
[2022-06-30 08:39:29,514][PID:6][DEBUG][redash.query_runner] Registering CockroachDB (cockroach) query runner.
[2022-06-30 08:39:29,515][PID:6][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.
[2022-06-30 08:39:29,714][PID:6][DEBUG][passlib.utils.compat] loaded lazy attr 'SafeConfigParser': <class ConfigParser.SafeConfigParser at 0x7f7f08980460>
[2022-06-30 08:39:29,714][PID:6][DEBUG][passlib.utils.compat] loaded lazy attr 'NativeStringIO': <built-in function StringIO>
[2022-06-30 08:39:29,714][PID:6][DEBUG][passlib.utils.compat] loaded lazy attr 'BytesIO': <built-in function StringIO>

 -------------- celery@redash-genericworker-65bb9df79d-tt8kw v4.3.0 (rhubarb)
---- **** ----- 
--- * ***  * -- Linux-5.15.37-051537-generic-x86_64-with-debian-10.0 2022-06-30 08:39:30
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         redash:0x7f7f04809f10
- ** ---------- .> transport:   redis://:**@redash-redis-master:6379/0
- ** ---------- .> results:     redis://:**@redash-redis-master:6379/0
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> default          exchange=default(direct) key=default
                .> emails           exchange=emails(direct) key=emails
                .> periodic         exchange=periodic(direct) key=periodic

[tasks]
  . redash.tasks.check_alerts_for_query
  . redash.tasks.cleanup_query_results
  . redash.tasks.empty_schedules
  . redash.tasks.execute_query
  . redash.tasks.record_event
  . redash.tasks.refresh_queries
  . redash.tasks.refresh_schema
  . redash.tasks.refresh_schemas
  . redash.tasks.send_aggregated_errors
  . redash.tasks.send_mail
  . redash.tasks.subscribe
  . redash.tasks.sync_user_details
  . redash.tasks.version_check

[2022-06-30 08:39:31,798][PID:6][ERROR][MainProcess] consumer: Cannot connect to redis://:**@redash-redis-master:6379/0: Error 111 connecting to redash-redis-master:6379. Connection refused..
Trying again in 2.00 seconds...

[2022-06-30 08:39:34,805][PID:6][ERROR][MainProcess] consumer: Cannot connect to redis://:**@redash-redis-master:6379/0: Error 111 connecting to redash-redis-master:6379. Connection refused..
Trying again in 4.00 seconds...```

`Logs(dev/redash-redis-master-0:redis)[1m]`

```redis 08:39:31.47 INFO  ==> ** Starting Redis **
1:C 30 Jun 2022 08:39:31.491 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 30 Jun 2022 08:39:31.491 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 30 Jun 2022 08:39:31.491 # Configuration loaded
1:M 30 Jun 2022 08:39:31.493 * Running mode=standalone, port=6379.
1:M 30 Jun 2022 08:39:31.493 # Server initialized
1:M 30 Jun 2022 08:39:31.494 * Ready to accept connections

Logs(dev/redash-scheduledworker-5dc5c9c89f-mm8qf:redash-scheduledworker)[1m]

[2022-06-30 08:41:48,921][PID:6][ERROR][MainProcess] consumer: Cannot connect to redis://:**@redash-redis-master:6379/0: Error 110 connecting to redash-redis-master:6379. Connection timed out..
Trying again in 6.00 seconds...

Logs(dev/redash-scheduler-7b5865c4-sbsfj:redash-scheduler)[1m]

[2022-06-30 08:41:48,918][PID:6][ERROR][MainProcess] consumer: Cannot connect to redis://:**@redash-redis-master:6379/0: Error 110 connecting to redash-redis-master:6379. Connection timed out..
Trying again in 6.00 seconds...

[2022-06-30 08:42:13,493][PID:11][ERROR][Beat] beat: Connection error: Error 110 connecting to redash-redis-master:6379. Connection timed out.. Trying again in 2.0 seconds...

Here are my settings.

main.tf

resource "helm_release" "redash" {
  name       = "redash"
  repository = "https://getredash.github.io/contrib-helm-chart/"
  chart      = "redash"
  version    = "3.0.0"
  namespace  = "dev"

  set {
    name  = "postgresql.postgresqlPassword"
    value = "password123verystrong"
  }

  set {
    name  = "redash.cookieSecret"
    value = "secret"
  }

  set {
    name  = "redash.secretKey"
    value = "key"
  }

  set {
    name  = "image.tag"
    value = "latest"
  }

  set {
    name  = "postgresql.image.repository"
    value = "bitnami/postgresql"
  }

  set {
    name  = "postgresql.image.tag"
    value = "9.6.17-debian-10-r3"
  }

  set {
    name  = "redis.image.repository"
    value = "bitnami/redis"
  }

  set {
    name  = "redis.image.tag"
    value = "6.0.8-debian-10-r0"
  }

  set {
    name  = "server.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "server.podSecurityContext.runAsUser"
    value = 1000
  }

  set {
    name  = "adhocWorker.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "adhocWorker.podSecurityContext.runAsUser"
    value = 1000
  }

  set {
    name  = "scheduledWorker.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "scheduledWorker.podSecurityContext.runAsUser"
    value = 1000
  }

  set {
    name  = "scheduler.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "scheduler.podSecurityContext.runAsUser"
    value = 1000
  }

  set {
    name  = "genericWorker.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "genericWorker.podSecurityContext.runAsUser"
    value = 1000
  }
  set {
    name  = "hookInstallJob.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "hookInstallJob.podSecurityContext.runAsUser"
    value = 1000
  }

  set {
    name  = "hookUpgradeJob.podSecurityContext.runAsNonRoot"
    value = true
  }

  set {
    name  = "hookUpgradeJob.podSecurityContext.runAsUser"
    value = 1000
  }

  set {
    name  = "ingress.enabled"
    value = "true"
  }

  set {
    name  = "ingress.hosts[0].host"
    value = "someurl"
  }

  set {
    name  = "ingress.hosts[0].paths[0]"
    value = "/"
  }

  set {
    name  = "ingress.annotations.traefik\\.ingress\\.kubernetes\\.io\\/router\\.tls"
    value = "true"
    type  = "string"
  }

  set {
    name  = "redash.enabledQueryRunners"
    value = "redash.query_runner.pg"
  }

  set {
    name  = "redash.enabledDestinations"
    value = "redash.destinations.mattermost"
  }
}
Rusiecki commented 2 years ago

@grugnog any Idea?

grugnog commented 2 years ago

@Rusiecki nothing really from the above - it seems like a cluster network issue of some kind, but I can't come up with any ideas from the information above, particularly if the same config works locally. My guess is that there is some additional network layer or policy enforcement on the corporate cluster that is denying the connection?

manvindar commented 2 years ago

Can we bump up redis helm chart version from 10 -> 17? there are breaking changes if we do it

grugnog commented 2 years ago

@manvindar PR would be welcome - not sure if that is related to this issue though.

jversolatocreditas commented 1 year ago

any idea?