getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.84k stars 1.76k forks source link

Since sentry self-hosted was upgraded from 23.5.0 to 23.11.0, the hard stop 23.6.2 is missing, how can I remedy it? #2582

Closed littlestar2125 closed 10 months ago

littlestar2125 commented 11 months ago

Self-Hosted Version

23.11.0

CPU Architecture

x86_64

Docker Version

23.0.5

Docker Compose Version

2.6.1

Steps to Reproduce

1.I'm new user in using sentry self-hosted,when i use 23.5.0, the feature 'user feedback' seems like not collect infos,so i try to upgrade the version. 2.when i upgrade the sentry self-hosted,we missing the hard stop 23.6.2,then upgrade the version to 23.11.0 directly,how can i remedy it? i don't known how to fix my data in the database,i don't want to lose my data.

Expected Result

1.I'm new user in using sentry self-hosted,when i use 23.5.0, the feature 'user feedback' seems like not collect infos,so i try to upgrade the version. 2.when i upgrade the sentry self-hosted,we missing the hard stop 23.6.2,then upgrade the version to 23.11.0 directly,how can i remedy it? i don't known how to fix my data in the database,i don't want to lose my data.

Actual Result

1.I'm new user in using sentry self-hosted,when i use 23.5.0, the feature 'user feedback' seems like not collect infos,so i try to upgrade the version. 2.when i upgrade the sentry self-hosted,we missing the hard stop 23.6.2,then upgrade the version to 23.11.0 directly,how can i remedy it? i don't known how to fix my data in the database,i don't want to lose my data.

Event ID

No response

hubertdeng123 commented 11 months ago

I would advise you to backup your postgres database immediately if you are performing any other sort of upgrade to prevent completely losing your data. Even so, it is unclear what needs to be done to fix your data. Are you getting any sort of errors after upgrading?

littlestar2125 commented 11 months ago

@hubertdeng123 hello,hubert,when i login sentry web,i found i lose my org permission,i can't login directly,here is my screenshot, 企业微信截图_17004466852021 if i click "please click here",i can login into web,and i can look my data in org.I will try to backup my data,and try to upgrade to 23.6.2 then upgrade to the lastest,thank you for your help,if you have any new idea,please tell me,thanks a lot.

martonivan commented 11 months ago

The issue is not unique.

I've upgraded from 23.4.0 and read the documentation a bit too late. Downgrading back to 23.6.2 seems to be failing due to a failing clickhouse-1 container startup. dependency failed to start: container sentry-self-hosted-clickhouse-1 is unhealthy Error in install/bootstrap-snuba.sh:4. '$dcr snuba-api bootstrap --no-migrate --force' exited with status 1 -> ./install.sh:main:31 --> install/bootstrap-snuba.sh:source:4

The clickhouse logs show things like these: 2023.11.20 21:25:23.359362 [ 1 ] {} <Error> Application: DB::Exception: Unknown setting allow_nullable_key for storage AggregatingMergeTree: Cannot attach tabledefault.functions_mv_localfrom metadata file /var/lib/clickhouse/metadata/default/functions_mv_local.sql from query ATTACH TABLE functions_mv_local (project_idUInt64,transaction_nameString,timestampDateTime,depthUInt32,parent_fingerprintUInt64,fingerprintUInt64,nameString,packageString,pathString,is_applicationUInt8,platformLowCardinality(String),environmentLowCardinality(Nullable(String)),releaseLowCardinality(Nullable(String)),os_nameLowCardinality(String),os_versionLowCardinality(String),retention_daysUInt16,countAggregateFunction(count, Float64),percentilesAggregateFunction(quantiles(0.5, 0.75, 0.9, 0.95, 0.99), Float64),minAggregateFunction(min, Float64),maxAggregateFunction(max, Float64),avgAggregateFunction(avg, Float64),sumAggregateFunction(sum, Float64),worstAggregateFunction(argMax, UUID, Float64),examplesAggregateFunction(groupUniqArray(5), UUID)) ENGINE = AggregatingMergeTree PARTITION BY (retention_days, toMonday(timestamp)) PRIMARY KEY (project_id, transaction_name, timestamp, depth, parent_fingerprint, fingerprint) ORDER BY (project_id, transaction_name, timestamp, depth, parent_fingerprint, fingerprint, name, package, path, is_application, platform, environment, release, os_name, os_version, retention_days) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity = 2048, allow_nullable_key = 1

hubertdeng123 commented 11 months ago

@martonivan Did you install script complete? Or is that an error that is displayed while running the script

littlestar2125 commented 11 months ago

@hubertdeng123 hello,hubuert,i back up my data in postgres and install 23.6.2,then throw a exception:

curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received install/error-handling.sh: line 82: /bin/docker: Argument list too long now i depoly the 23.11.0,and i tried to login,it still tell me not hava organization permission,and click "please click here",i can login into web,and i can look my data in org. And i tried to invite somebody into organization,the email lose the organization. And I can't found User feedback menu.

martonivan commented 11 months ago

@martonivan Did you install script complete? Or is that an error that is displayed while running the script

The install script failed with the error message above and the clickhouse logs are also harvested right after the install.sh execution.

Worth to be mentioned that I've finally solved the issue by dropping the databases, right after creating a backup. The fresh install having the backup.json restored worked fine, of course without the production data (that finally we were ready to lose).

buffcode commented 11 months ago

Also stumbled upon this. Upgrade seemed to go fine, from v22.x to v23.11. Maybe its viable to interrupt the update when a hard stop is going to be missed to avoid future data corruption.

So far I only saw that docker compose run --rm web permissions list -u <super-user email> does not return anything. Adding users.admin did not help.

Additionally SAML SSO is broken now and even after deleting the configuration the error returns after following the UI flow again. For those looking to remove their SSO configuration to restore user/password-based login for the moment:

docker-compose run --rm web shell
from sentry.models.authprovider import AuthProvider
AuthProvider.objects.all().delete()
buffcode commented 11 months ago

Regarding the failing login I traced it down to sentry_organizationmapping being empty. It should have been filled by 0478_backfill_organization_mappings_via_outbox, but even inserting the outbox manually via

INSERT INTO sentry_regionoutbox
(shard_scope, shard_identifier, category, object_identifier, scheduled_from, scheduled_for)
VALUES
(0, 1, 2, 1, NOW(), '2016-08-01 00:00:00');

just removes the record after a short time without further effect.

buffcode commented 11 months ago

Could it be because src/sentry/receivers/outbox/region.py does not define a receiver for OutboxCategory.ORGANIZATION_UPDATE? So this message just gets dropped.

The corresponding test for the prementioned backfill is also marked with

@pytest.mark.skip("Test setup no longer valid after adding is_test to organization model")
buffcode commented 11 months ago

After stepping through the code I got everything working again by changing the organization's slug back and forth.

  1. Go to organization settings: https://sentry.example.org/settings/<org slug>/
  2. Change slug, eg. by appending some string
  3. Change slug back to original value

sentry_organizationmapping is then populated correctly and logins, SAML setup etc. work as before.

littlestar2125 commented 11 months ago

@buffcode hello,buffcode,thanks for your reply,yesterday i read the source code,and i found sentry_organizationmapping is empty,then i tried to insert a data,just like this: insert into sentry_organizationmapping (organization_id,slug , name , date_created , verified , idempotency_key , region_name , status , require_2fa , early_adopter , allow_joinleave , enhanced_privacy , disable_shared_issues , disable_new_visibility_features , require_email_verification , codecov_access)values(1,'sentry','Sentry',now(),false,'','--monolith--',0,false,false,true,false,false,false,false,false); ,then it's worked good.when i login the sentry,i can into web directly,don't show the "no organization permission".

I've noticed that these tables about region are empty, and I'm wondering if it has any other implications?

I didn't understand the steps you said to resolve this issue, do I need to perform the following steps: 1.insert into the sentry_regionoutbox by INSERT INTO sentry_regionoutbox (shard_scope, shard_identifier, category, object_identifier, scheduled_from, scheduled_for) VALUES (0, 1, 2, 1, NOW(), '2016-08-01 00:00:00'); 2.update my organization slug and change slug back to original value. 3.Delete the data I just inserted into the sentry_regionoutbox

azaslavsky commented 11 months ago

I've noticed that these tables about region are empty, and I'm wondering if it has any other implications?

Are you talking specifically about sentry_regionoutbox? That table should always be empty on self-hosted installs (it is only used on SaaS).

littlestar2125 commented 10 months ago

@azaslavsky hello,I want to know if there will be any potential issues caused by solving the login error 'no organization permissions' by inserting into the sentry_organizationmapping table. I didn't quite understand the solution @buffcode provided. At present, the issues I encountered have been mostly resolved, but I am not sure if inserting into the table is a reasonable solution and whether it will cause any further problems.

azaslavsky commented 10 months ago

I believe the the solution as they outlined above (change slug, then change back) should be able to fix at least some of what you are seeing. No guarantees though - going past the hard stop may have put your install into uncharted territory. Still, changing slugs is a common and safe operation, so it feels decently low risk to me.

littlestar2125 commented 10 months ago

@azaslavsky My problem is solved, I'll keep watching to see if I get any other problems, thanks for the help!

cotillion commented 10 months ago

Got hit by this despite doing the hard stop at 23.6.2. Slug rename appears to have allowed SSO to work again.

tibuprophen commented 10 months ago

After stepping through the code I got everything working again by changing the organization's slug back and forth.

1. Go to organization settings: `https://sentry.example.org/settings/<org slug>/`

2. Change slug, eg. by appending some string

3. Change slug back to original value

sentry_organizationmapping is then populated correctly and logins, SAML setup etc. work as before.

Thank you! This also fixed my upgrade-problem. But there seems to be still an issue. All invite links are expired. I am not able to active/invite members. I also did a docker compose restart after chaning and then reverting the slug name.