department-of-veterans-affairs / notification-api

Notification API
MIT License
16 stars 9 forks source link

502 issue: Schedule and Execute Twilio Status Updates #2058

Closed k-macmillan closed 3 weeks ago

k-macmillan commented 1 month ago

User Story - Business Need

For any messages updates that failed to reach our servers we want to make sure we still update the status of those notifications. We wrote a function to call Twilio and update the notification with the new status, this is the ticket to find notifications that need that little update and execute it.

User Story(ies)

As a VA Notify client I want my notification status current So that I know if my notification was delivered

Additional Info and Resources

Acceptance Criteria

QA Considerations

The only way for us to e2e test this is to in some way kill our ability to receive or process the notification from Twilio. We can adjust the lambda handler code to just return and see if the cron picks it up an hour later. Using a twilio test number, it may be beneficial to test this once just to ensure it works, then fire off 500 to make sure our system can handle it and Twilio doesn't rate limit us or something. Twilio's rate limiting is not clear and most searches point to the "verify" service.

Potential Dependencies

Updating old notifications in the notification_history table.

npmartin-oddball commented 1 month ago

Hey team! Please add your planning poker estimate with Zenhub @cris-oddball @EvanParish @k-macmillan @MackHalliday @mchlwellman

mchlwellman commented 4 weeks ago

I have the celery task coded and it seems to be working. Today I have to do testing and get ready for review.

mchlwellman commented 3 weeks ago

I have tested this in dev, by sending a SMS, updating the notification in the db and letting the job update. I've added some exception handling for 429s to break out of the job, because they are all going to fail. I need to add a test and then open the PR.

cris-oddball commented 3 weeks ago

PR approved and merged, deployed to Perf for testing.

QA PASSED

Logs showing the cron job kicking off, finding the one status to check, making the call and then updating it Image

Logs supporting successful updates - 500 updates expected Image

500 updates processed with 200 statuses (no other http status seen) Image

Noting most of those messages were updated to

"status": "permanent-failure", "status_reason": "Unable to deliver",

Perhaps because this was an AWS simulated number? Only 63 were delivered.

On the next check, 0 notifications were found to update.

Redeploying lambda.

Notes:

cris-oddball commented 3 weeks ago

Leaving this open to remind myself to send it to prod Tuesday morning.

cris-oddball commented 3 weeks ago

On it's way to Staging.

cris-oddball commented 3 weeks ago

Handed off to Evan to send to Prod (first pipeline to auto-post release notes).

cris-oddball commented 3 weeks ago

On prod, closing ticket.