Open funkypenguin opened 3 months ago
One issue identified is that Gatus can get very noisy when restarting apps outside of the scheduled maintenance window, and there's no global "off" switch, since emails are sent from each tenant Gatus instance. A possible option would be a "global off switch" for the account that Gatus uses to send the emails (through mailgun)
Can there be an email notification (e.g. the message put in elf announce) sent to all users when this begins to notify non discord users?
Where possible a more staged approach would be ideal, So that it doesn't impact all users all at once for these things. Migrating symlinks was done to a pilot group - so in this instance it could have been done over a few weeks / nights and moving say 30% of the workload at a time to the new storage.
yeah, that's a nice idea.. we already split users into 26 groups alphabetically for sharding of the flux reconciliations, that might help us to apply changes to smaller sample set in future...
It might make sense to split users by the last 2 digits of account/subscription numbers so you don't have to worry about splitting the alphabet manually.
This issue captures feedback and learnings from the recent outage caused by the NFS / Cilium bug, with a mind to improving our processes for inevitable future issues. I'm looking for a list of issues only (not going to debate as to why / what happened, just looking for a "bucket" for all feedback to go into while it's fresh, so that we don't loose the value).
Please post feedback / observations / suggestions below :)