getredash / contrib-helm-chart

Community maintained Redash Helm Chart
Apache License 2.0
163 stars 144 forks source link

Helm defaults not even starting up - [CRITICAL] WORKER TIMEOUT #146

Open JustinGuese opened 1 year ago

JustinGuese commented 1 year ago

error

redash main pod

redash-859c5f57c5-jlxjs [2023-06-07 08:48:40 +0000] [40] [INFO] Booting worker with pid: 40
redash-859c5f57c5-jlxjs [2023-06-07 08:48:49 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:13)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:49 +0000] [13] [INFO] Worker exiting (pid: 13)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:50 +0000] [45] [INFO] Booting worker with pid: 45
redash-859c5f57c5-jlxjs [2023-06-07 08:48:59 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:30)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:59 +0000] [30] [INFO] Worker exiting (pid: 30)
redash-859c5f57c5-jlxjs [2023-06-07 08:48:59 +0000] [50] [INFO] Booting worker with pid: 50                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:04 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:35)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:04 +0000] [35] [INFO] Worker exiting (pid: 35)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:04 +0000] [55] [INFO] Booting worker with pid: 55                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:14 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:40)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:14 +0000] [40] [INFO] Worker exiting (pid: 40)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:15 +0000] [60] [INFO] Booting worker with pid: 60                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:24 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:45)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:24 +0000] [45] [INFO] Worker exiting (pid: 45)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:24 +0000] [65] [INFO] Booting worker with pid: 65                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:33 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:50)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:33 +0000] [50] [INFO] Worker exiting (pid: 50)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:34 +0000] [70] [INFO] Booting worker with pid: 70                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:39 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:55)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:39 +0000] [55] [INFO] Worker exiting (pid: 55)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:39 +0000] [75] [INFO] Booting worker with pid: 75                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:60)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:48 +0000] [60] [INFO] Worker exiting (pid: 60)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:49 +0000] [80] [INFO] Booting worker with pid: 80                             redash-859c5f57c5-jlxjs [2023-06-07 08:49:58 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:65)                              redash-859c5f57c5-jlxjs [2023-06-07 08:49:58 +0000] [65] [INFO] Worker exiting (pid: 65)                                redash-859c5f57c5-jlxjs [2023-06-07 08:49:58 +0000] [85] [INFO] Booting worker with pid: 85

genericworker

redash-genericworker-79c9547cff-4n76m     if self.connection.exists(self.key) and \                                     redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 1581, in exiredash-genericworker-79c9547cff-4n76m     return self.execute_command('EXISTS', *names)                                 redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 898, in execredash-genericworker-79c9547cff-4n76m     conn = self.connection or pool.get_connection(command_name, **options)        redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 1182, inredash-genericworker-79c9547cff-4n76m     connection.connect()                                                          redash-genericworker-79c9547cff-4n76m   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 554, in redash-genericworker-79c9547cff-4n76m     raise ConnectionError(self._error_message(e))                                 redash-genericworker-79c9547cff-4n76m redis.exceptions.ConnectionError: Error 110 connecting to redash-redis-master:6379redash-genericworker-79c9547cff-4n76m 2023-06-07 08:50:07,947 INFO exited: worker-0 (exit status 1; not expected)       redash-genericworker-79c9547cff-4n76m 2023-06-07 08:50:08,953 INFO spawned: 'worker-0' with pid 24 

I guess redis does not deploy?

recreating

just helm install it like in the basic example, all default values

-> nothing shows up when port-forwarding

Samuel29 commented 1 year ago

+1 for me, more context: I override version of postgresql (14)

my values.yaml:

  postgresql:
    image:
      tag: "14"
    persistence:
      # custom settings for PVC

  image:
    # SL 6/6/23: latest version of Redash docker image
    tag: 10.1.0.b50633

redash pod logs:

Using Database: postgresql://redash:******@redash-dev-postgresql:5432/redash
Using Redis: redis://:******@redash-dev-redis-master:6379/0
[2023-06-07 14:55:46 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2023-06-07 14:55:46 +0000] [7] [INFO] Listening at: http://0.0.0.0:5000 (7)
[2023-06-07 14:55:46 +0000] [7] [INFO] Using worker: sync
[2023-06-07 14:55:46 +0000] [10] [INFO] Booting worker with pid: 10
[2023-06-07 14:55:46 +0000] [11] [INFO] Booting worker with pid: 11
[2023-06-07 14:55:46 +0000] [12] [INFO] Booting worker with pid: 12
[2023-06-07 14:55:46 +0000] [13] [INFO] Booting worker with pid: 13
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:10)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:11)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:12)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:13)
[2023-06-07 14:56:17 +0000] [11] [INFO] Worker exiting (pid: 11)
[2023-06-07 14:56:17 +0000] [12] [INFO] Worker exiting (pid: 12)
[2023-06-07 14:56:17 +0000] [10] [INFO] Worker exiting (pid: 10)
[2023-06-07 14:56:17 +0000] [13] [INFO] Worker exiting (pid: 13)
[2023-06-07 14:56:18 +0000] [30] [INFO] Booting worker with pid: 30
[2023-06-07 14:56:18 +0000] [31] [INFO] Booting worker with pid: 31
[2023-06-07 14:56:18 +0000] [32] [INFO] Booting worker with pid: 32
[2023-06-07 14:56:18 +0000] [33] [INFO] Booting worker with pid: 33
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:30)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:31)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:32)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:33)
[2023-06-07 14:56:48 +0000] [33] [INFO] Worker exiting (pid: 33)
[2023-06-07 14:56:49 +0000] [32] [INFO] Worker exiting (pid: 32)
[2023-06-07 14:56:49 +0000] [30] [INFO] Worker exiting (pid: 30)
[2023-06-07 14:56:49 +0000] [31] [INFO] Worker exiting (pid: 31)
[2023-06-07 14:56:50 +0000] [50] [INFO] Booting worker with pid: 50
[2023-06-07 14:56:50 +0000] [51] [INFO] Booting worker with pid: 51
[2023-06-07 14:56:50 +0000] [52] [INFO] Booting worker with pid: 52
[2023-06-07 14:56:50 +0000] [53] [INFO] Booting worker with pid: 53
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:50)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:51)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:52)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:53)
[2023-06-07 14:57:20 +0000] [51] [INFO] Worker exiting (pid: 51)
[2023-06-07 14:57:20 +0000] [50] [INFO] Worker exiting (pid: 50)
[2023-06-07 14:57:20 +0000] [53] [INFO] Worker exiting (pid: 53)
[2023-06-07 14:57:21 +0000] [52] [INFO] Worker exiting (pid: 52)
[2023-06-07 14:57:22 +0000] [70] [INFO] Booting worker with pid: 70
[2023-06-07 14:57:22 +0000] [71] [INFO] Booting worker with pid: 71
[2023-06-07 14:57:22 +0000] [72] [INFO] Booting worker with pid: 72
[2023-06-07 14:57:22 +0000] [73] [INFO] Booting worker with pid: 73
Samuel29 commented 1 year ago

update: here are attached the pod's logs with LOG_LEVEL=DEBUG the bad news is: I can't find any smoking gun :-(

redash-server-debug.log

Samuel29 commented 1 year ago

update 2: reproduced with the default version of postgres, as well as with v. 14 or 15 also reproduced on my M1 macbook within docker desktop. interestingly the redash_server container is consuming a lot of CPU but there's no relevant debug info in the logs.

image
Samuel29 commented 1 year ago

resource limits were the culprit! once I removed them, it worked a lot better ! @JustinGuese these were the resource limits that I used. I'm still digging around to figure out what's the best fit (I can't let redash use all my cluster resources)

    resources: {}
      # limits:
      #   cpu: 500m
      #   memory: 3Gi
      # requests:
      #   cpu: 100m
      #   memory: 500Mi
JustinGuese commented 1 year ago

Alright I'll have a look on Friday, exceeding 3Gi memory sounds way too much though 😅 thanks already


From: Samuel @.> Sent: Wednesday, June 7, 2023 6:45:39 PM To: getredash/contrib-helm-chart @.> Cc: Justin Güse @.>; Mention @.> Subject: Re: [getredash/contrib-helm-chart] Helm defaults not even starting up - [CRITICAL] WORKER TIMEOUT (Issue #146)

resource limits were the culprit! once I removed them, it worked a lot better ! @JustinGuesehttps://github.com/JustinGuese these were the resource limits that I used. I'm still digging around to figure out what's the best fit (I can't let redash use all my cluster resources)

resources: {}
  # limits:
  #   cpu: 500m
  #   memory: 3Gi
  # requests:
  #   cpu: 100m
  #   memory: 500Mi

— Reply to this email directly, view it on GitHubhttps://github.com/getredash/contrib-helm-chart/issues/146#issuecomment-1581185902, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACJFA2Y36JV4SGOYJUC4PM3XKCV3HANCNFSM6AAAAAAY5RC45M. You are receiving this because you were mentioned.Message ID: @.***>

JustinGuese commented 1 year ago

nope, still nothing. the workers throw the following error

Using Database: postgresql://redash:******@redash-postgresql:5432/redash
Using Redis: redis://:******@redash-redis-master:6379/0
Starting RQ worker...
2023-06-12 14:16:43,020 INFO RPC interface 'supervisor' initialized
2023-06-12 14:16:43,020 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2023-06-12 14:16:43,021 INFO supervisord started with pid 6
2023-06-12 14:16:44,025 INFO spawned: 'worker_healthcheck' with pid 9
2023-06-12 14:16:44,029 INFO spawned: 'worker-0' with pid 10
2023-06-12 14:16:45,032 INFO success: worker_healthcheck entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
READY
2023/06/12 14:16:57 [worker_healthcheck] Starting the health check for worker process Checks config: [(<class 'redash.cli.rq.WorkerHealthcheck'>, {})]
2023/06/12 14:16:57 [worker_healthcheck] Installing signal handlers.
2023/06/12 14:17:01 [worker_healthcheck] Received TICK_60 event from supervisor
RESULT 2
OKREADY
2023/06/12 14:17:01 [worker_healthcheck] No processes in state RUNNING found for process worker
2023/06/12 14:18:01 [worker_healthcheck] Received TICK_60 event from supervisor
RESULT 2
OKREADY
2023/06/12 14:18:01 [worker_healthcheck] No processes in state RUNNING found for process worker
2023/06/12 14:19:01 [worker_healthcheck] Received TICK_60 event from supervisor
2023/06/12 14:19:01 [worker_healthcheck] No processes in state RUNNING found for process worker
RESULT 2
OKREADY
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 550, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 606, in _connect
    raise err
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 594, in _connect
    sock.connect(socket_address)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./manage.py", line 9, in <module>
    manager()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 586, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 426, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/app/redash/cli/rq.py", line 49, in worker
    w.work()
  File "/usr/local/lib/python3.7/site-packages/rq/worker.py", line 511, in work
    self.register_birth()
  File "/usr/local/lib/python3.7/site-packages/rq/worker.py", line 273, in register_birth
    if self.connection.exists(self.key) and \
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 1581, in exists
    return self.execute_command('EXISTS', *names)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 898, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 1182, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 554, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to redash-redis-master:6379. Connection timed out.
2023-06-12 14:19:08,615 INFO exited: worker-0 (exit status 1; not expected)
2023-06-12 14:19:09,617 INFO spawned: 'worker-0' with pid 23
JustinGuese commented 1 year ago

so i would say redis doesnt work... i also can't see any redis pod, so i guess the redis pod isn't created?

JustinGuese commented 1 year ago

also this repo is 3 years old, so i guess they do not offer support anymore & therefore i won't use it anyways, thanks for your help though!

grugnog commented 1 year ago

The chart is working with the default values, so I am guessing this must be something with your local setup?

JustinGuese commented 1 year ago

hm, at least not only for me, but also @Samuel29 I'm using K3s, maybe there is a problem with that

Samuel29 commented 1 year ago

Oh you make a good point. I'm using a managed Kubernetes cluster (v1.25) + ArgoCD + Helm

Samuel29 commented 1 year ago

For the record here is my setup the cluster is made of 10+ nodes with 4 cores / 15GB RAM my values (I'm using Redash as a helm dependency)

redash:
  postgresql:
    image:
      # use postgres v15 instead of 9.6 (!)
      tag: "15"
    persistence:
      # OVH managed K8s
      storageClass: csi-cinder-high-speed

  image:
    # SL 6/6/23: latest version of Redash docker image
    tag: 10.1.0.b50633
    # (... some ingress variables kept for me)
  server:
    # server.resources -- Server resource requests and limits [ref](http://kubernetes.io/docs/user-guide/compute-resources/)
    resources: 
      limits:
        cpu: 1000m
        memory: 4Gi
      requests:
        cpu: 100m
        memory: 500Mi