Open JustinGuese opened 1 year ago
+1 for me, more context: I override version of postgresql (14)
my values.yaml:
postgresql:
image:
tag: "14"
persistence:
# custom settings for PVC
image:
# SL 6/6/23: latest version of Redash docker image
tag: 10.1.0.b50633
redash pod logs:
Using Database: postgresql://redash:******@redash-dev-postgresql:5432/redash
Using Redis: redis://:******@redash-dev-redis-master:6379/0
[2023-06-07 14:55:46 +0000] [7] [INFO] Starting gunicorn 20.0.4
[2023-06-07 14:55:46 +0000] [7] [INFO] Listening at: http://0.0.0.0:5000 (7)
[2023-06-07 14:55:46 +0000] [7] [INFO] Using worker: sync
[2023-06-07 14:55:46 +0000] [10] [INFO] Booting worker with pid: 10
[2023-06-07 14:55:46 +0000] [11] [INFO] Booting worker with pid: 11
[2023-06-07 14:55:46 +0000] [12] [INFO] Booting worker with pid: 12
[2023-06-07 14:55:46 +0000] [13] [INFO] Booting worker with pid: 13
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:10)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:11)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:12)
[2023-06-07 14:56:17 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:13)
[2023-06-07 14:56:17 +0000] [11] [INFO] Worker exiting (pid: 11)
[2023-06-07 14:56:17 +0000] [12] [INFO] Worker exiting (pid: 12)
[2023-06-07 14:56:17 +0000] [10] [INFO] Worker exiting (pid: 10)
[2023-06-07 14:56:17 +0000] [13] [INFO] Worker exiting (pid: 13)
[2023-06-07 14:56:18 +0000] [30] [INFO] Booting worker with pid: 30
[2023-06-07 14:56:18 +0000] [31] [INFO] Booting worker with pid: 31
[2023-06-07 14:56:18 +0000] [32] [INFO] Booting worker with pid: 32
[2023-06-07 14:56:18 +0000] [33] [INFO] Booting worker with pid: 33
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:30)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:31)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:32)
[2023-06-07 14:56:48 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:33)
[2023-06-07 14:56:48 +0000] [33] [INFO] Worker exiting (pid: 33)
[2023-06-07 14:56:49 +0000] [32] [INFO] Worker exiting (pid: 32)
[2023-06-07 14:56:49 +0000] [30] [INFO] Worker exiting (pid: 30)
[2023-06-07 14:56:49 +0000] [31] [INFO] Worker exiting (pid: 31)
[2023-06-07 14:56:50 +0000] [50] [INFO] Booting worker with pid: 50
[2023-06-07 14:56:50 +0000] [51] [INFO] Booting worker with pid: 51
[2023-06-07 14:56:50 +0000] [52] [INFO] Booting worker with pid: 52
[2023-06-07 14:56:50 +0000] [53] [INFO] Booting worker with pid: 53
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:50)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:51)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:52)
[2023-06-07 14:57:20 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:53)
[2023-06-07 14:57:20 +0000] [51] [INFO] Worker exiting (pid: 51)
[2023-06-07 14:57:20 +0000] [50] [INFO] Worker exiting (pid: 50)
[2023-06-07 14:57:20 +0000] [53] [INFO] Worker exiting (pid: 53)
[2023-06-07 14:57:21 +0000] [52] [INFO] Worker exiting (pid: 52)
[2023-06-07 14:57:22 +0000] [70] [INFO] Booting worker with pid: 70
[2023-06-07 14:57:22 +0000] [71] [INFO] Booting worker with pid: 71
[2023-06-07 14:57:22 +0000] [72] [INFO] Booting worker with pid: 72
[2023-06-07 14:57:22 +0000] [73] [INFO] Booting worker with pid: 73
update: here are attached the pod's logs with LOG_LEVEL=DEBUG the bad news is: I can't find any smoking gun :-(
update 2: reproduced with the default version of postgres, as well as with v. 14 or 15 also reproduced on my M1 macbook within docker desktop. interestingly the redash_server container is consuming a lot of CPU but there's no relevant debug info in the logs.
resource limits were the culprit! once I removed them, it worked a lot better ! @JustinGuese these were the resource limits that I used. I'm still digging around to figure out what's the best fit (I can't let redash use all my cluster resources)
resources: {}
# limits:
# cpu: 500m
# memory: 3Gi
# requests:
# cpu: 100m
# memory: 500Mi
Alright I'll have a look on Friday, exceeding 3Gi memory sounds way too much though 😅 thanks already
From: Samuel @.> Sent: Wednesday, June 7, 2023 6:45:39 PM To: getredash/contrib-helm-chart @.> Cc: Justin Güse @.>; Mention @.> Subject: Re: [getredash/contrib-helm-chart] Helm defaults not even starting up - [CRITICAL] WORKER TIMEOUT (Issue #146)
resource limits were the culprit! once I removed them, it worked a lot better ! @JustinGuesehttps://github.com/JustinGuese these were the resource limits that I used. I'm still digging around to figure out what's the best fit (I can't let redash use all my cluster resources)
resources: {}
# limits:
# cpu: 500m
# memory: 3Gi
# requests:
# cpu: 100m
# memory: 500Mi
— Reply to this email directly, view it on GitHubhttps://github.com/getredash/contrib-helm-chart/issues/146#issuecomment-1581185902, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACJFA2Y36JV4SGOYJUC4PM3XKCV3HANCNFSM6AAAAAAY5RC45M. You are receiving this because you were mentioned.Message ID: @.***>
nope, still nothing. the workers throw the following error
Using Database: postgresql://redash:******@redash-postgresql:5432/redash
Using Redis: redis://:******@redash-redis-master:6379/0
Starting RQ worker...
2023-06-12 14:16:43,020 INFO RPC interface 'supervisor' initialized
2023-06-12 14:16:43,020 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2023-06-12 14:16:43,021 INFO supervisord started with pid 6
2023-06-12 14:16:44,025 INFO spawned: 'worker_healthcheck' with pid 9
2023-06-12 14:16:44,029 INFO spawned: 'worker-0' with pid 10
2023-06-12 14:16:45,032 INFO success: worker_healthcheck entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
READY
2023/06/12 14:16:57 [worker_healthcheck] Starting the health check for worker process Checks config: [(<class 'redash.cli.rq.WorkerHealthcheck'>, {})]
2023/06/12 14:16:57 [worker_healthcheck] Installing signal handlers.
2023/06/12 14:17:01 [worker_healthcheck] Received TICK_60 event from supervisor
RESULT 2
OKREADY
2023/06/12 14:17:01 [worker_healthcheck] No processes in state RUNNING found for process worker
2023/06/12 14:18:01 [worker_healthcheck] Received TICK_60 event from supervisor
RESULT 2
OKREADY
2023/06/12 14:18:01 [worker_healthcheck] No processes in state RUNNING found for process worker
2023/06/12 14:19:01 [worker_healthcheck] Received TICK_60 event from supervisor
2023/06/12 14:19:01 [worker_healthcheck] No processes in state RUNNING found for process worker
RESULT 2
OKREADY
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 550, in connect
sock = self._connect()
File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 606, in _connect
raise err
File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 594, in _connect
sock.connect(socket_address)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./manage.py", line 9, in <module>
manager()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 586, in main
return super(FlaskGroup, self).main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 426, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/app/redash/cli/rq.py", line 49, in worker
w.work()
File "/usr/local/lib/python3.7/site-packages/rq/worker.py", line 511, in work
self.register_birth()
File "/usr/local/lib/python3.7/site-packages/rq/worker.py", line 273, in register_birth
if self.connection.exists(self.key) and \
File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 1581, in exists
return self.execute_command('EXISTS', *names)
File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 1182, in get_connection
connection.connect()
File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 554, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to redash-redis-master:6379. Connection timed out.
2023-06-12 14:19:08,615 INFO exited: worker-0 (exit status 1; not expected)
2023-06-12 14:19:09,617 INFO spawned: 'worker-0' with pid 23
so i would say redis doesnt work... i also can't see any redis pod, so i guess the redis pod isn't created?
also this repo is 3 years old, so i guess they do not offer support anymore & therefore i won't use it anyways, thanks for your help though!
The chart is working with the default values, so I am guessing this must be something with your local setup?
hm, at least not only for me, but also @Samuel29 I'm using K3s, maybe there is a problem with that
Oh you make a good point. I'm using a managed Kubernetes cluster (v1.25) + ArgoCD + Helm
For the record here is my setup the cluster is made of 10+ nodes with 4 cores / 15GB RAM my values (I'm using Redash as a helm dependency)
redash:
postgresql:
image:
# use postgres v15 instead of 9.6 (!)
tag: "15"
persistence:
# OVH managed K8s
storageClass: csi-cinder-high-speed
image:
# SL 6/6/23: latest version of Redash docker image
tag: 10.1.0.b50633
# (... some ingress variables kept for me)
server:
# server.resources -- Server resource requests and limits [ref](http://kubernetes.io/docs/user-guide/compute-resources/)
resources:
limits:
cpu: 1000m
memory: 4Gi
requests:
cpu: 100m
memory: 500Mi
error
redash main pod
genericworker
I guess redis does not deploy?
recreating
just helm install it like in the basic example, all default values
-> nothing shows up when port-forwarding