Open mrnicegyu11 opened 2 years ago
@sanderegg do you mind following up this one in my absence? I will start having a look and leave some notes here of my findings.
@matusdrobuliak66 @pcrespov is this still a thing or did we add something to check connections to PG?
Garbage collector does not handle PG unavailability gracefully.
Incident report:
Today at approx 9:45 (Zurich Time) Taylor was forcefully logged out of his study (UUID
6e4ea1a4-b02c-11ec-b9d7-02420a0b0063
) on dalco-staging. These errors where discovered in the following investigation.During this time there were multiple network errors and package drops in Z43. These are the main cause of the errors.
There were many aiopg timeouts.
This is the stack-trace of one exception thrown by the garbage colelctor running
remove_users_manually_marked_as_guests
, which ERRORED with a stack trace as such:Actionable follow-up: PG being unreachable needs to be handled gracefully throughout the app: