department-of-veterans-affairs / notification-api

Notification API
MIT License
16 stars 9 forks source link

Retry After Pinpoint Provider Failure #1578

Open k-macmillan opened 6 months ago

k-macmillan commented 6 months ago

User Story - Business Need

This is not a new issue, but an enhancement. AWS has been returning status updates for SMS with _SMS.FAILURE and a record_status of UNKNOWN (and a few others). Our system marks this as a temporary-failure which is exactly what AWS says these failures mean, but there is no retry in place. It appears the intent was that clients retry themselves. This is not a good system for clients so we should build retry functionality for these use cases. This happens at a rate of about 500 per day.

User Story(ies)

As a Service I want to not retry if VA Notify experiences a temporary failure after sending me a 201 response So that my messages make it to the recipient

Additional Info and Resources

This is happening when we get status updates from AWS. That means the message has already been sent and we cannot use any of our existing retry functionality to resolve this. deliver_sms was successful and we have a provider reference. AWS attempted to send the message but failed and they expect us to retry. We get this notice during delivery status processing.

So whenever the event_type is “_SMS.FAILURE” and record_status is any of “UNREACHABLE”, “UNKNOWN”, “CARRIER_UNREACHABLE”, “EXPIRED”. [...] customer application need to retry the SMS message delivery.

image.png

Engineering Checklist

Acceptance Criteria

QA Considerations

mjones-oddball commented 5 months ago

Hey team! Please add your planning poker estimate with Zenhub @cris-oddball @k-macmillan @EvanParish @edDocMe360 @kalbfled