Closed mjones-oddball closed 9 months ago
Hey team! Please add your planning poker estimate with Zenhub @cris-oddball @EvanParish @k-macmillan @kalbfled @ldraney @mchlwellman
@bevnobev @cris-oddball Update on this bug: @k-macmillan and I discussed. SES does async processing. Soft bounces are retried by SES automatically. We trust the delivered status and should ensure it's honored in this case. In the hard bounce case only DNS lookup failures are retried by SES so we are a little unclear if we can/should trust delivered and will escalate a production impact ticket to Amazon for them to confirm before we decide. It's likely the same case as soft bounce due to async processing, but just want to be 100% as hard bounces are usually permanent failures. See more info here: https://docs.aws.amazon.com/ses/latest/dg/send-email-concepts-deliverability.html and
Bounces can also be synchronous or asynchronous. A synchronous bounce occurs while the email servers of the sender and receiver are actively communicating. An asynchronous bounce occurs when a receiver initially accepts an email message for delivery and then subsequently fails to deliver it to the recipient.
Identified where in the code this happening. We get a status of delivered
, then we get a bounce
status update from AWS. We usually consider delivered
to be the final resting point of a message, but that is not always the case. Will discuss with @mjones-oddball tomorrow.
Hard bounces are confirmed failures and should have their status changed to permanent-failure
. I have asked for clarification on soft bounces so we can determine a path forward for those.
Soft bounces may also be considered an end-state for that notification attempt. Handled both hard/soft bounces and prevented updating the status back to delivered if there is a bounced status, per AWS discussion.
PR approved and merged. Due to the complexity of integration testing, replying on unit tests for this one. Sent to Perf, where a regression will be run.
Regression passed. @k-macmillan @mjones-oddball are we doing any data clean-up on a second ticket or on this ticket? If a second ticket I will hold this here in QA until I get the new ticket linked here.
No need to hold this ticket for the clean-up. Closing as done.
Description
Beverly was pulling stats in BigQuery and Domo and noticed that some delivered notifications are showing failure status reasons.
Steps to Reproduce
Example results below
You can tell that this is only happening with hard and soft bounces by returning a distinct status_reason list.
Results:
Impact/Urgency
High - We cannot know If notifications are inaccurately being set to delivered until we investigate the root cause of the status/status reason combination.
Expected Behavior
3 pt limit, please keep the team informed if this becomes more complex than expected so we can re-evaluate next steps/additional tickets
Checklist
Clean up notifications/notification history in the database to reflect accurate status and status reason combinationClean up BigQuery Prod stats table to reflect accurate status and status reason combinationQA Considerations
For QA to populate. Leave blank if QA is not applicable on this ticket.
We don't yet know how this was caused to know how to replicate it.
Additional Info & Resources