getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.22k stars 4.21k forks source link

Add retry logic to backpressure redis clusters #81102

Open kneeyo1 opened 22 hours ago

kneeyo1 commented 22 hours ago

Backpressure does not reinitalize redis cluster or single node redis connections on timeouts.

Some sort of maintenance event / replication failover in our redis cluster meant that an old host/port combo configured in our backpressure was no longer pointing to our active cluster. Backpressure started receiving timeouts when trying to get metrics, marking cluster as unhealthy. This should force it to reinit on timeout.

relates to https://github.com/getsentry/sentry-redis-tools/pull/18

lynnagara commented 3 hours ago

should this be a backpressure-specific thing?

shouldn't redis cluster code behave this way by default everywhere?

codecov[bot] commented 2 hours ago

:x: 1487 Tests Failed:

Tests completed Failed Passed Skipped
23116 1487 21629 215
View the top 3 failed tests by shortest run time > > ```python > tests.sentry.hybridcloud.services.test_control_organization_provisioning.TestControlOrganizationProvisioningSlugUpdates__InControlMode::test_conflicting_unregistered_organization_with_slug_exists > ``` > >
Stack Traces | 0.005s run time > > > > > ```python > > No failure message available > > ``` > >
tests.sentry.users.api.endpoints.test_user_authenticator_details.UserAuthenticatorDetailsTest::test_sms_get_phone
Stack Traces | 0.005s run time > > ```python > No failure message available > ```
tests.sentry.users.web.test_accounts.TestAccounts::test_post_success
Stack Traces | 0.005s run time > > ```python > No failure message available > ```

To view more test analytics, go to the Test Analytics Dashboard Got feedback? Let us know on Github