TabbycatDebate / tabbycat

Debating tournament tabulation software for British Parliamentary and a variety of two-team parliamentary formats
https://tabbycat.readthedocs.io/
GNU Affero General Public License v3.0
245 stars 827 forks source link

Use SendGrid API to detect email bounces, deliveries etc. #848

Closed czlee closed 5 years ago

czlee commented 6 years ago

This is probably a nice-to-have but it would be a nice extension to our notifications system if we could pull information from SendGrid about which emails bounced, which ones were delivered, etc., since SendGrid has richer information about this.

(At the moment, our sent message records just record messages that were sent to SendGrid, not anything about whether SendGrid managed to send them.)

tienne-B commented 6 years ago

For reference:

https://sendgrid.com/docs/API_Reference/Event_Webhook/getting_started_event_webhook.html https://sendgrid.com/docs/API_Reference/Web_API_v3/bounces.html https://sendgrid.com/docs/API_Reference/Web_API_v3/invalid_emails.html

Don't think emails can be fetched, but we can check through these.

tienne-B commented 6 years ago

I'm making some progress here (currently on a branch attached to 2d0281c93cd50c8a90b8cff76904b8ae1a9a474d). Figuring out how to present this info.

tienne-B commented 5 years ago

Little status update.

Model Migrations

There will now be three tables in the notifications app.

New Schema

All these changes have migrations that will prevent any data-loss.

SentMessageRecord

Field Type Notes
id Primary Key Unchanged (sadly)
message_id Email Explained below
recipient Foreign Key -> Person; Unchanged
method Enumeration Unchanged
email Email Unchanged
context JSON Unchanged
message Text Contains full email (with headers)
notification Foreign Key -> BulkNotification

Each email message has its own unique identifier found in the email header as Message-ID. This ID is in the form of an email address and so should be held in an email field. While this would be a good primary key, to maintain compatibility with old records (before the headers were also included in the message field), it is not.

BulkNotification

Field Type Notes
id Primary Key Automatically generated
event Enumeration From SentMessageRecord; Unchanged
timestamp Date-Time Current Time; from SentMessageRecord; Unchanged
tournament Foreign Key -> Tournament; from SentMessageRecord
round Foreign Key -> Round; from SentMessageRecord

This table is formed with fields originally from SentMessageRecord. All these fields are identical between emails sent for the same event, as all messages from TC can be sent to many participants at once. It will reduce duplication of records in SentMessageRecord.

EmailStatus

Field Type Notes
email Foreign Key -> SentMessageRecord
timestamp Date-Time Current time
event Enumeration The change in status
data JSON Other sent data

This new table is to record and store the events sent by the SendGrid web-hook. The timestamp field can also be taken (and then converted from UNIX to a proper timestamp) from the web-hook. As there is a set number of events sent to the web-hook, an enumeration is used with the keys being in correspondance with the POST data. Using an enumeration allows for i18n/l10n of the statuses.

Views

There will be two new views, one that acts as a list of SentMessageRecords, and a POST-only view for the SendGrid web-hook to use.

SentMessageRecord list

Using a Vue table would be good, but it doesn't seem to support grouping of emails. The tables should be grouped in the bulk-notification that it is a part of. Other than the grouping problem, the table would have columns for the person and the status.

Web-hook POST view

This URL is tournament-specific, but should really be site-scope. To prevent unauthorized access, there is a preference to create a "secret key" that is in the URL; which if accessed with an inaccurate code will show a 404. To make the URL site-wide, I'd wait for #685, which uses site-wide preferences, which could be expanded for this use.

Participant selection views

It is out-of-scope to show most email statuses in those views, but statuses like "bounced" should be reflected by highlighting the affected email address. It shouldn't call the websocket though.

Websocket

Email statuses would be a good candidate for a web-socket. The event is sent by the POST view and received by the list. This would allow tab directors who are looking at the email statuses to have live-updating when an email has been read or a problem arises. This would mean that the list view should be in Vue, and that the table is socket-enabled. (Two of my favorite things; async and JS-Vue /s )

czlee commented 5 years ago

This is super interesting, now that Yale IV's behind me I'll look in the next few days!

czlee commented 5 years ago

Thoughts/feedback:

  1. This is great! Also nice work figuring out how to use the SendGrid webhook.

  2. It's actually fine for the primary key to be the numeric ID field, even if there's another unique field. Even unique fields very occasionally need to be changed manually because of some sort of mistake or something (and primary key fields are read-only).

    That said, for what it's worth, since fully-fledged notifications are still in its relatively early stages, if breaking backwards compatibility would help make the feature cleaner, now's the time to do it! (Of course, try to avoid it, but if there's no other way…)

  3. I might suggest (but open to discussion) that SentMessage.recipient have on_delete=models.SET_NULL rather than models.CASCADE, since the email is still there and as far as "sent message records" go you probably want to remember the fact that you sent the email even after you delete the person. Similarly for BulkNotification.tournament and BulkNotification.round. In my head, in this context, dirty data is better than lost data. (This is in contrast to the draw models, where having clean data is paramount.) But this might have other side effects that I haven't thought about.

  4. Enumeration of EmailStatus.event is good, particularly with the i18n bit; all I was gonna say was that it'll be convenient to match SendGrid's API for this, which you've already done 😎

  5. Probably dumb question: What happens with notifications that aren't bulk notifications, like ballot receipts? Do we just make a BulkNotification for each individual email?

tienne-B commented 5 years ago

IDs don't have to be numeric; hashes are also commonly used. The Message-ID does function as a primary key, but it doesn't matter much. Is it possible to create a foreign key relation through a unique field (that isn't the primary key)? Then we wouldn't have to get the SentMessageRecord object.

I'd categorize ballot receipts as BulkNotifications (even if its for a panel of 1) because the principle is still that an action prompted notifications to be sent.

czlee commented 5 years ago

Sorry, when I said "numeric ID", the salient feature I was thinking of was "an ID that is generated by Tabbycat". I try not to have primary keys generated by someone else (in this case the mail server), because then it relies on there not being some sort of mistake, either on their side or in our handling of the interface. You're right that whether it's a number or a string isn't important.

I thought that about making the foreign key relation via the message ID to avoid a database lookup, but it actually doesn't help, at least if you want your code to be robust: You still want to check that the claimed message ID actually exists before inserting it into the database, otherwise the database UPDATE will violate a foreign key constraint and crash with a ProgrammingError.

Another option would be just to have EmailStatus store the message ID, not as a foreign key and without checking for existence. This would gain insertion efficiency at the cost of having to deal with any mismatches later, when we try to look things up. But when you think about it, if we bulk retrieve them then it's only one database hit to grab all of the existing matching messages, so it doesn't seem worth it just to save one database query.

czlee commented 5 years ago

Just curious, does Django complain at all about having both message and message_id as field names in the same model? Normally if you have a foreign key called message then Django will automatically create a field message_id which it actually uses. But I guess here it's fine because message is not a foreign key?

Also, I just read up on how Message-ID works—this makes sense now, sorry. Are you worried about performance issues tied to having long primary keys at all, in this case what I think is a string of several dozen characters?

tienne-B commented 5 years ago

By using to_field=message_id and db_constraint=False, we don't care about whether the email is in the database, if it doesn't correspond to one, instead of raising an exception it would insert the row but the row just wouldn't be accessible through the foreign key relation.

Also, if it would only receive messages around each 30sec, would it be useful to have it live-updating with a websocket?

czlee commented 5 years ago

Right, so using ForeignKey(to_field='message_id', db_constraint=False) would be exactly equivalent my third option, having EmailStatus just store the message ID not as a foreign key, right? You'd gain code conciseness, but at the expense of clarity. (A reader has to figure out that that ForeignKey is set up like that to see what's going on; before then, it's a bit obscure. It takes more code to make this explicit, but it's also easier to read.)

It doesn't remove the underlying mismatches issue I alluded to, but perhaps you're not so concerned about that if you only ever which to traverse the reverse relation (i.e., you will never try to access EmailStatus.email, only SentMessageRecord.emailstatus_set). I'd still like to alert the user about mismatches immediately though, as well as record them in the logs, since it could indicate a problem with their configuration. This doesn't mean it has to be on insertion (since that's not tied to a user process anyway), I just want any problems with these keys causing status reports not to refer to an email, to be discovered sooner rather than later, when you're trying to do forensics on some problem with emails and it's too late. (It might not be a code bug; it could be a problem with their SendGrid setup.)

I don't think the 30 second thing makes it not useful 🙂 but I also don't feel strongly about websockets here.

tienne-B commented 5 years ago

Hey, you already wrote code to do it well; might as well use that :) I do see your point.

And I would want to avoid writing web-sockets just due to the pain of debugging and integrating. Any ideas on how to do "group by"s in the Vue table(s)? Or just try plain HTML (without the existing table generators)?

czlee commented 5 years ago

Oh, if that's your concern with websockets, definitely skip it for a first implementation. We can always bring it in later if we need it.

How do you want to group them? Grouping and sorting are in effect the same thing, right? (Not always, but in this context?)

tienne-B commented 5 years ago

Grouping would be separating the items (like with a new section & table), by BulkNotification. Just to avoid repeating the BulkNotification as a column. Think it would be neater.

czlee commented 5 years ago

Ah, I see. @philipbelesky might have thoughts on this?

philipbelesky commented 5 years ago

Um don't have a strong preference without seeing it. TablesContainer should be able to produce an arbitrary number of tables. Given these lists are likely to get very long grouping them into separate tables makes sense.

czlee commented 5 years ago

Okay, so debugging efforts, other than those already in #992:

  1. As noted by @tienne-B elsewhere, "open" and "click" events aren't posted with SMTP IDs, just SendGrid IDs. So Tabbycat will need to store this information itself (sg_event_id or sg_message_id) in its own database table. (I'd rather than than relying on a relationship that SendGrid might later change.)

  2. The message IDs don't seem to match. From what I can tell, the one currently stored in the Tabbycat database as a "sent message" is different from the one reported by SendGrid. Furthermore, I checked the one I actually received in my email, and it matches SendGrid's, not Tabbycat's.

tienne-B commented 5 years ago

For reference: BACKEND-184.

SendGrid does not seem to send events on processed, even though it is selected. This is important as that type of message contains both the Message-ID and sg_message_id, so they could be attached. This means that even if the Message-IDs matched, it's useless.

czlee commented 5 years ago

Hmm, it's sent at least one on my instance: Jan 14 09:14:10 calm-shore-***** app/web.1: 17:14:09 wsgi.1 | [{'email': 'c*****@gmail.com', 'event': 'processed', 'sg_event_id': 'jXzN*****', 'sg_message_id': '3wtB*****', 'smtp-id': '<15474*****@af7d*****.prvt.dyno.rt.heroku.com>', 'timestamp': 1547486046}]

More generally, since we're accepting an API from a third party, our code should be written on the assumption that their output might not always comply with (their own) API, and fail gracefully when it doesn't (e.g., by pushing useful information to the logs).

czlee commented 5 years ago

SendGrid apparently allows for user-defined arguments, solely to allow user tracking, maybe this would help? Then we don't have to worry about the workings of the Message-ID header? https://sendgrid.com/docs/API_Reference/Event_Webhook/event.html#uniqueargs https://sendgrid.com/docs/API_Reference/SMTP_API/unique_arguments.html (There's a bit more reading to do than this.)

tienne-B commented 5 years ago

Sure, but then we'd have to replace all the Python/Django email functions with Sendgrid-specific functions. Personally, I think Tabbycat should not make vendor-specific functions (in this case, SendGrid), and that the status web-hook frôle the line already. I wonder if the unique arguments are just put as custom headers on each message. If so, it'd be trivial.

czlee commented 5 years ago

They basically are just custom headers on each message 😉 or more precisely a single custom header called "X-SMTPAPI" that contains a JSON object, which in turn contains a "unique_args" key: https://sendgrid.com/docs/API_Reference/SMTP_API/getting_started_smtp.html

Or at least, they can be, if we continue to use SMTP. There are also more direct APIs, but for the same reasons as you I'd rather not use them—I'd like Tabbycat to support any email backend, not just SendGrid.

That said, I'd refine our opposition to vendor-specific functions a little bit out of necessity: Where a feature (email status tracking) can't be made to be vendor-agnostic, and said feature is non-essential (which it is), I'm okay with having vendor-specific (SendGrid) code, as long as it'll just automatically disable the feature with a helpful error message when the vendor (SendGrid) isn't being used.