CDCgov / prime-reportstream

ReportStream is a public intermediary tool for delivery of data between different parts of the healthcare ecosystem.
https://reportstream.cdc.gov
Creative Commons Zero v1.0 Universal
69 stars 39 forks source link

Implement and re-enable duplication detection for the UP #14103

Open mkalish opened 4 months ago

mkalish commented 4 months ago

User Story

As a sender to the UP, I would like to enable duplicate detection and have those items filtered As a sender to the UP, I would like to be able to send a batch of messages that includes some messages I've already sent

Description/Use Case

Due to a bug, in the implementation the duplication feature never worked for the UP and needs to be re-implemented

Risks/Impacts/Considerations

Product Rationale

To solve the file limit issue we've decoupled the batching step within the pipeline so that we can process messages one at a time. However, we do allow senders to send batched messages. Thus we will have scenarios where a sender sends a batch, some messages in that batch are good and processed and others error out. When we ask the sender to fix the errors and resend, if they send in batch, they could resend the whole batch including the messages that were already good and processed. To avoid processing the same messages twice we need a way to detect the duplicate message in the batch and ensure the duplicates are not processed, only the messages that needed fixing should get through.

Dev Notes

Product Context

Deduplication was originally built into the CP in response to a specific sender bug. This occurred the beginning of July 22 where a sender(PMG) introduced a bug in their system resulting in tests results not being marked as sent. That meant their system starting sending the same message over and over again. This continued for a few days before we turned off the auth for that sender. In those few days PMG was sender >800,000 messages/day. We worked with PMG and they eventually fixed the bug.

In followup to that that incident, we decided to implement deduplication logic to catch the resends in the event a similar situation occurred in the future. The idea being that duplicate reports would be discarded without the need to completely disable the sender.

Question: Should MessageID uniqueness be enforced here as well?

Acceptance Criteria

JFisk42 commented 4 months ago

Hey team! Please add your planning poker estimate with Zenhub @adegolier @arnejduranovic @brick-green @david-navapbc @jack-h-wang @jalbinson @mkalish @thetaurean

Andrey-Glazkv commented 4 months ago

Hey team! Please add your planning poker estimate with Zenhub @adegolier @arnejduranovic @brick-green @jack-h-wang @jalbinson @mkalish @thetaurean

Andrey-Glazkv commented 4 months ago

Please add your planning poker estimate with Zenhub @david-navapbc

Andrey-Glazkv commented 4 months ago

@brandonnava please have a look - need reqs for this one

brandonnava commented 4 months ago

Updated with specific use case and product rationale section outlining the scenario surrounding the specific use case that warrants for duplication detection