getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.91k stars 1.78k forks source link

Nearly all notification setting subpages produce errors with 404s from the API #2939

Open jklaiho opened 9 months ago

jklaiho commented 9 months ago

Environment

self-hosted (https://develop.sentry.dev/self-hosted/)

Steps to Reproduce

  1. Log in to self-hosted Sentry.
  2. Go to Settings -> My Account -> Notifications.
  3. Click to open any of the following subpages (the API endpoint that returns 404 parenthesized for each):
    • Issue Alerts (/api/0/users/me/notifications/alerts/)
    • Issue Workflow (/api/0/users/me/notifications/workflow/)
    • Deploys (/api/0/users/me/notifications/deploy/)
    • Nudges (/api/0/users/me/notifications/approval/)
    • Weekly Reports (/api/0/users/me/notifications/reports/)

Expected Result

All the subpages open up as they should.

Actual Result

They produce "Oops! Something went wrong" errors, with the browser's inspector producing 404 results from the listed API endpoints.

What works: the "Email Routing" subpage.

PUT requests resulting from changing the values of "My Own Activity" and "Resolve and Auto-Assign" to /api/0/users/me/notifications/ work normally.

The logs for the web container produce this in response to a single failing page load attempt:

sentry-self-hosted-web-1  | 08:04:14 [INFO] sentry.access.api: api.access (method='GET' view='sentry.api.endpoints.catchall.CatchallEndpoint' response=404 user_id='5' is_app='False' token_type='None' is_frontend_request='True' organization_id='None' auth_id='None' path='/api/0/users/me/notifications/alerts/' caller_ip='x.x.x.x' user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15' rate_limited='False' rate_limit_category='None' request_duration_seconds=0.09223556518554688 rate_limit_type='DNE' concurrent_limit='None' concurrent_requests='None' reset_time='None' group='None' limit='None' remaining='None')
sentry-self-hosted-web-1  | 08:04:14 [WARNING] django.request: Not Found: /api/0/users/me/notifications/alerts/ (status_code=404 request=<WSGIRequest: GET '/api/0/users/me/notifications/alerts/'>)
sentry-self-hosted-web-1  | 08:04:14 [INFO] sentry.access.api: api.access (method='GET' view='sentry.api.endpoints.user_notification_details.UserNotificationDetailsEndpoint' response=200 user_id='5' is_app='False' token_type='None' is_frontend_request='True' organization_id='None' auth_id='None' path='/api/0/users/me/notifications/' caller_ip='x.x.x.x' user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15' rate_limited='False' rate_limit_category='None' request_duration_seconds=0.11344289779663086 rate_limit_type='DNE' concurrent_limit='None' concurrent_requests='None' reset_time='None' group='None' limit='None' remaining='None')
sentry-self-hosted-web-1  | 08:04:14 [INFO] sentry.access.api: api.access (method='GET' view='sentry.api.endpoints.user_emails.UserEmailsEndpoint' response=200 user_id='5' is_app='False' token_type='None' is_frontend_request='True' organization_id='None' auth_id='None' path='/api/0/users/me/emails/' caller_ip='x.x.x.x' user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15' rate_limited='False' rate_limit_category='None' request_duration_seconds=0.10629034042358398 rate_limit_type='DNE' concurrent_limit='None' concurrent_requests='None' reset_time='None' group='None' limit='None' remaining='None')
sentry-self-hosted-web-1  | 08:04:15 [INFO] sentry.access.api: api.access (method='GET' view='sentry.api.endpoints.project_index.ProjectIndexEndpoint' response=200 user_id='5' is_app='False' token_type='None' is_frontend_request='True' organization_id='None' auth_id='None' path='/api/0/projects/' caller_ip='x.x.x.x' user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15' rate_limited='False' rate_limit_category='None' request_duration_seconds=0.2665131092071533 rate_limit_type='DNE' concurrent_limit='None' concurrent_requests='None' reset_time='None' group='None' limit='None' remaining='None')

If relevant: this is an old Sentry installation, dating back to at least the 8.x or 9.x era, if not earlier. Initially it wasn't even container-based yet, but we've successfully upgraded it all the way to modern versions, and in the Compose era we've always done it with the official documented process, following all mandatory version steps.

The version is not the latest available; we haven't had the chance to upgrade yet. A quick glance over the release notes since 23.11.2 didn't immediately reveal any fixes to this issue in particular, but I may be wrong. (Looks like getsentry/sentry#2695, a previous report of mine, had its fix also land sometime between 23.11.2 and 24.2.0, but did not appear in the Releases page.)

Product Area

Settings

Link

No response

DSN

No response

Version

23.11.2

getsantry[bot] commented 7 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 9 months ago

Assigning to @getsentry/support for routing ⏲️

azaslavsky commented 9 months ago

When you look at the logs, what do successful API calls look like? Do they have the same structure (/api/0/...).

I must admit this is a very weird error. I suppose it could be related to the long chain of upgrades, but that feels unlikely for something as simple as hitting an API endpoint.

jklaiho commented 8 months ago

In the logs, here's what a successful PUT request to for the Resolve and Auto-Assign setting looked like:

sentry-self-hosted-web-1  | 13:54:12 [INFO] sentry.superuser: superuser.request (url='http://sentry.somedomain.com/api/0/users/me/notifications/' method='PUT' ip_address='x.x.x.x' user_id=5)
sentry-self-hosted-web-1  | 13:54:12 [INFO] sentry.access.api: api.access (method='PUT' view='sentry.api.endpoints.user_notification_details.UserNotificationDetailsEndpoint' response=200 user_id='5' is_app='False' token_type='None' is_frontend_request='True' organization_id='None' auth_id='None' path='/api/0/users/me/notifications/' caller_ip='x.x.x.x' user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36' rate_limited='False' rate_limit_category='None' request_duration_seconds=0.11553406715393066 rate_limit_type='DNE' concurrent_limit='None' concurrent_requests='None' reset_time='None' group='None' limit='None' remaining='None')

So the successful PUT requests go to /api/0/users/me/notifications/ whereas the 404 GET requests try to load subpaths underneath it, e.g. /api/0/users/me/notifications/alerts/ in my original post.

hubertdeng123 commented 8 months ago

I am unable to reproduce this issue. Is this happening with all the subpages underneath Settings -> My Account -> Notifications? For what it's worth, we are getting PUT requests to /api/0/users/me/notification-options/

jklaiho commented 8 months ago

The Email Routing subpage is the only one that works. The API call done as a result of loading it is to /api/0/users/me/notifications/email/, returning HTTP 200. Every other subpage's API call results in a 404, with no other change to the API URL than the last path component on any given subpage.

The PUT requests from the "My Own Activity" and "Resolve and Auto-Assign" toggles go to /api/0/users/me/notifications/ and succeed. How could this URL possibly be different in two different self-hosted installations, unless it's literally changed between the short time between 23.11.2 and now? I have not modified the Sentry code at all. I've also diligently modernized our config.yml and sentry.conf.py files every time we've upgraded by comparing the differences to the new example files so that there wouldn't be any legacy cruft left over. (And we've changed precious few things in them to begin with; mainly just to configure e-mails and reverse SSL proxying. git status in our /opt/sentry directory shows no changes, with HEAD detached at 23.11.2.)

I'm a Django dev myself, so I could maybe spend some time debugging this, perhaps by duplicating the production environment to another VPS, but some pointers would be appreciated; Sentry is a large enough project to be inconvenient to start blindly poking at.

jklaiho commented 8 months ago

To add to the previous comment, the only thing that comes to mind that could be different between two self-hosted installations is if some configurations related to these problems are stored in the database, because the database is the single part of this system still surviving from the very old 8.x days. But I have no idea if Sentry does that.

hubertdeng123 commented 8 months ago

I personally am not too familiar with the older versions of Sentry <9.0, so I'm afraid I won't be able to give many pointers there. The notification-options endpoint appears to have been added 7 months ago. Do you remember a version of Sentry you were on that had this working before 23.11.2? That may give some clue into what is happening.

getsantry[bot] commented 8 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

jklaiho commented 7 months ago

Do you remember a version of Sentry you were on that had this working before 23.11.2? That may give some clue into what is happening.

No, I don't think we even ever used these functionalities before, so we didn't run into this. Only very recently as the company has grown in size and our projects in complexity have we started exploring the options and settings deeper.

As I said before, git status inside /opt/sentry shows no changes with HEAD detached at 23.11.2, which makes this even more bizarre. There are no local changes to Sentry code, no untracked files either. The error happens on computers and browsers that have never visited the Sentry instance before, so it's not a browser cache issue at least.

How could two self-hosted, Docker Composed Sentry instances running a version after the notification-options introduction (ours, and the one you tried replicating this on) possibly behave differently here? The only thing I can think of is some URL configuration stored and left over in Postgres or some other data storage container from long ago, but this is pure speculation and I can't prove it without intimate knowledge of Sentry internals.

While writing this, it occurred to me that some container somewhere might be running an out of date image for some reason, so I went to the server, ran docker image prune -a to delete all container images that were not used by the current containers, restarted the whole stack with docker compose down + docker compose up -d. docker image ls shows the only extant getsentry/* images at 23.11.2. But still, the same error persists on a fresh browser that has never visited the instance.

azaslavsky commented 7 months ago

I'm going to transfer this into sentry proper, just in case the notifications team has any ideas, though at this point I'm running out of options to consider. You've been very diligent at exploring possible avenues, so I'm not really sure which other options to explore.

getsantry[bot] commented 7 months ago

Routing to @getsentry/product-owners-settings-general for triage ⏲️

getsantry[bot] commented 7 months ago

Routing to @getsentry/product-owners-alerts for triage ⏲️

scefali commented 7 months ago

@jklaiho So I think the problem is somehow you have a stale version of the front-end code with the latest backend code. The front-end should not be calling /api/0/users/me/notifications/alerts/ anymore, that was split up into /api/0/users/me/notification-options/alerts/ and /api/0/users/me/notification-providers/alerts/.

jklaiho commented 7 months ago

@scefali indeed, and given the information I've provided in earlier comments, it's not at all clear how this could be the case. Two theoretical options come to mind:

  1. While I did remove the containers and rebuild them from the correct images (as detailed above), I did nothing to any volumes. I have no idea how Sentry operates here with e.g. manage.py collectstatic; maybe some volumes initially created long ago could still have incorrect frontend code in them? Where should I look to check?
  2. Does Sentry store some notification-related URLs in a data store separate from the actual frontend code? This could also be stale in some old volume.
ceorourke commented 7 months ago

hey @azaslavsky kicking this one back to you since it seems like the issue is stale front end code