Open tabbitsp opened 1 week ago
FYI - after rebooting the system or re-starting the docker containers, all seems to work fine again for ~24 hours after which the web stops responding again
Could this be related to this issue? https://github.com/unbit/uwsgi/issues/2480 Connecting to the docker image for the web I found
Could this be related to this issue? unbit/uwsgi#2480 Connecting to the docker image for the web I found
- it runs on Python 3.12.3 and its pip shows
- pyuwsgi 2.0.27a1 being installed
Interesting, i thought this issue from snuba. Any other logs appear?
Unfortunately none I would see as giving a hint on what‘s wrong. I set the max-worker-lifetime for the uWSGI workers to 0 as workaround and all seems good so far. Any reasons not to do this, although this is the default value for this option?
Unfortunately none I would see as giving a hint on what‘s wrong. I set the max-worker-lifetime for the uWSGI workers to 0 as workaround and all seems good so far. Any reasons not to do this, although this is the default value for this option?
that's a lil bit strange, my guesses because it needed for worker to "reloaded" to make sure it's not running too long and better at accepting requests. Maybe wait for some more time to see how the sentry react after applying this changes ?
Self-Hosted Version
24.9.0
CPU Architecture
x86_64
Docker Version
27.3.1
Docker Compose Version
2.29.7
Steps to Reproduce
Expected Result
Web continues to work
Actual Result
Cannot access web page anymore, but services still seem to run normally according to "docker compose ps". I already raised the "listen" to 300 (reference: https://github.com/getsentry/self-hosted/issues/3346 ) as we saw this coming up as well, but it did not help keeping the system running. It seems to be happening when there is a peak in performance required. RAM / CPU look normal all the time. The only hint is this, extracted from the logs:
web-1 | worker 1 lifetime reached, it was running for 86401 second(s) web-1 | worker 2 lifetime reached, it was running for 86401 second(s) web-1 | worker 3 lifetime reached, it was running for 86401 second(s) web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.totals.initial_query is not part of Referrer Enum web-1 | 10:34:30 [WARNING] sentry.snuba.referrer: referrer api.metrics.totals.initial_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.series.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.series.second_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.series.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.series.second_query is not part of Referrer Enum
No information on any uWSGI respawn, so I think this could be the issue? Is this problem known to you?
Event ID
No response