getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.83k stars 1.76k forks source link

Webpage stops working after ~24 hours #3370

Open tabbitsp opened 1 week ago

tabbitsp commented 1 week ago

Self-Hosted Version

24.9.0

CPU Architecture

x86_64

Docker Version

27.3.1

Docker Compose Version

2.29.7

Steps to Reproduce

Expected Result

Web continues to work

Actual Result

Cannot access web page anymore, but services still seem to run normally according to "docker compose ps". I already raised the "listen" to 300 (reference: https://github.com/getsentry/self-hosted/issues/3346 ) as we saw this coming up as well, but it did not help keeping the system running. It seems to be happening when there is a peak in performance required. RAM / CPU look normal all the time. The only hint is this, extracted from the logs:

web-1 | worker 1 lifetime reached, it was running for 86401 second(s) web-1 | worker 2 lifetime reached, it was running for 86401 second(s) web-1 | worker 3 lifetime reached, it was running for 86401 second(s) web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.totals.initial_query is not part of Referrer Enum web-1 | 10:34:30 [WARNING] sentry.snuba.referrer: referrer api.metrics.totals.initial_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.series.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.series.second_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.totals.second_query is not part of Referrer Enum web-1 | Traceback (most recent call last): web-1 | File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 935, in validate_referrer web-1 | raise Exception(error_message) web-1 | Exception: referrer api.metrics.series.second_query is not part of Referrer Enum web-1 | 10:34:31 [WARNING] sentry.snuba.referrer: referrer api.metrics.series.second_query is not part of Referrer Enum

No information on any uWSGI respawn, so I think this could be the issue? Is this problem known to you?

Event ID

No response

tabbitsp commented 1 week ago

FYI - after rebooting the system or re-starting the docker containers, all seems to work fine again for ~24 hours after which the web stops responding again

tabbitsp commented 1 week ago

Could this be related to this issue? https://github.com/unbit/uwsgi/issues/2480 Connecting to the docker image for the web I found

bijancot commented 1 week ago

Could this be related to this issue? unbit/uwsgi#2480 Connecting to the docker image for the web I found

  • it runs on Python 3.12.3 and its pip shows
  • pyuwsgi 2.0.27a1 being installed

Interesting, i thought this issue from snuba. Any other logs appear?

tabbitsp commented 1 week ago

Unfortunately none I would see as giving a hint on what‘s wrong. I set the max-worker-lifetime for the uWSGI workers to 0 as workaround and all seems good so far. Any reasons not to do this, although this is the default value for this option?

bijancot commented 1 week ago

Unfortunately none I would see as giving a hint on what‘s wrong. I set the max-worker-lifetime for the uWSGI workers to 0 as workaround and all seems good so far. Any reasons not to do this, although this is the default value for this option?

that's a lil bit strange, my guesses because it needed for worker to "reloaded" to make sure it's not running too long and better at accepting requests. Maybe wait for some more time to see how the sentry react after applying this changes ?