NickCraver / StackExchange.Exceptional

Error handler used for the Stack Exchange network
https://nickcraver.com/StackExchange.Exceptional/
Apache License 2.0
862 stars 170 forks source link

Rate-limit emails #61

Closed cocowalla closed 8 years ago

cocowalla commented 9 years ago

The roll-up feature for duplicate exceptions is great in the web UI, but it would be good to have a similar sort of result when sending email notifications of exceptions - it's not nice having your inbox fill up with thousands of emails!

It would be good to rate-limit emails, so that, for example, no more than 1 email is sent every x minutes for each unique exception.

CptRobby commented 9 years ago

@cocowalla - I actually did this very thing on my own fork (in commit CptRobby/StackExchange.Exceptional@8b47aba2f97292bdd4a734929cf3663109342b5d) It was the first issue that I had with this otherwise excellent product and it caused me to just buckle down and take the source code and change it so that it didn't keep sending me multiple emails telling me the same thing over and over. After I got it working the way I wanted, I created my own fork and committed what I had done so that others could enjoy it too. ;)

@NickCraver - Obviously I would add my vote to request this change as well. It really is something that I'm sure is needed by many others. You're welcome to take what I've done and modify it however you need, or I could send you a pull request if you like. I'm a few commits behind you since I'm still using VS 2012 and can't use the new C#6 features (unfortunately), but I'm pretty sure I could take what I have and adjust it to match the changes that you've made in the latest commits. ;)

cocowalla commented 9 years ago

@CptRobby gets a :+1: from me!

@NickCraver I understand that you want to keep Exceptional as bare bones as possible, with a focus on extensibility, but it's hard to imagine a scenario where someone wouldn't want to rate-limit emails. This is also probably non-trivial to implement using the existing extensibility points (e.g. ErrorStore.OnBeforeLog)

NickCraver commented 9 years ago

@cocowalla @CptRobby I'm not against this at all, but there's a race here that's really unsolvable holistically - it's just a matter of where on the spectrum a tradeoff makes sense. Either we queue emails for n time period (this can get complicated pretty fast in itself) and batch, or we send emails instantaneously.

The former has the problem of delayed notification (often every minute counts in fixing a production issue). The latter has the problem of too much email. The current rollup behavior happens after notification of the first exception since the view medium is updatable - a luxury email doesn't have. I'm not really sure the best way to solve this...I guess a batch delay that defaults to 0 would be best, and keeping it simple too.

Off the top of my head, I'd use the in-memory store and have the email timer on a n TimeSpan interval (I'd say 30 seconds is probably a good setting for those using it). On timer hit, we batch the emails (this already happens in the memory store), and flush to email, clearing the queue. Is that the kind of behavior you two are picturing?

cocowalla commented 9 years ago

Delaying and batching can be made really simple using Rx (but of course, that adds another dependency that you probably don't want).

A third option is to maintain a dictionary of exception fingerprints with an expiry time. Then whenever an email is going to be sent we create a fingerprint of the exception and compare it to our dictionary. I implemented something along these lines for Elmah a while back: gist

CptRobby commented 9 years ago

@NickCraver - Actually, the way I implemented it is not using a delayed batch at all, but sending it as soon as it is inserted into the database (or other store). What I did is I added a boolean property called IsOriginalError to the Error class. This property is not persisted to the database, nor is it included in the JSON or the hash. In the constructors that take an Exception, the property is initialized to true. The order of execution that happens when an Exception occurs is as follows:

  1. The Error object is created from the Exception, in which case the IsOriginalError property is set to true.
  2. The configured ErrorStore is given the Error to be persisted. Using SQLErrorStore for example, if the RollupThreshold is configured, then an update statement is generated to increment the DuplicateCount of any matching Errors in the database. If the number of rows affected is > 0, then we know that it found a match and the IsOriginalError property is set to false. But if the count is 0, then an insert statement is generated and executed and the IsOriginalError property is set to true. (Redundant, I know - But clear.)
  3. The Error is handed to the ErrorEmailer for the notification email to be sent. Early in this process, it checks to see if the newly added EmailSettings.PreventDuplicates option is set to true. If so, then it checks to see if the IsOriginalError property is false, in which case it returns without sending an email.

As you can see, this approach allows for the email notification to be sent without any delay while at the same time preventing Errors that have already been identified as duplicates from being emailed. It also provides a simple configuration that doesn't have to change the existing behavior for anyone who wants to continue receiving an email for each and every error. Additionally this doesn't require any changes to any table structures. Each of the ErrorStores use a similar pattern of looking for a duplicate and incrementing the DuplicateCount before creating a new persisted Error representation, including the retry queue. So it's a simple matter in each case to set the IsOriginalError property appropriately.

My current copy of Exceptional that I'm using is still based off of the pre-C#6 commits and I have several other changes as well (like being able to roll up errors based on the date when it last occurred, so that the RollupThreshold is the amount of time that must pass without a duplicate error occurring before a new record will be created). But just say the word and I will create a clean fork and send you a pull request with the code following the newer C#6 styles that you've been using in your latest commits. ;)