CDCgov / prime-reportstream

ReportStream is a public intermediary tool for delivery of data between different parts of the healthcare ecosystem.
https://reportstream.cdc.gov
Creative Commons Zero v1.0 Universal
73 stars 40 forks source link

Docuemnt Report Stream (UP) Data Retention Policies #10796

Open JFU-NAVA-PBC opened 1 year ago

JFU-NAVA-PBC commented 1 year ago

User Story

As Report Stream DevOp engineer, Backend Developer, Report Stream Administrator, I want a comprehensive documentation covering but might not limited to the following Data Retention Areas

  1. Document the purpose of intermediate data in each UP steps (receive, convert, route, translate, batch, send) created intermediate data (reports) persisted in Azure Blob Storage, see screen shot:
  2. image.png
  3. Document the the data retention policies associated with the intermediate data in each step (reference CDC data retention requirements) 3.1 How long should intermediate data be kept in Azure storage (age of intermediate data) 3.2 Where the aged data should be archived 3.3 ....

Other info about the report stream UP's persistent storage foot print:

e.g. for one inbound report (HL7 v2) of 1MB, how many MB storage used for storing intermediate data

Description/Use Case

Here is info about Report Stream UP sizing of persistent storage foot print:

Intake HL7 messages:

'47_04608646_11024_Mega Specimen.HL7' : 8758 bytes '47_32361_04608646_11034_Mega Case.HL7' : 8623 bytes

Intermediate data and sizing in UP steps: Azure Blob Storage -> Container: Reports:

  1. Batch 8.2 KB batch/development.DEV_ELIMS/oru_r01-base-6d748afe-1103-4b58-867d-4472b63f7d01-20230805163808.hl7
    7.9 KB batch/development.DEV_ELIMS/oru_r01-base-b4e9d48c-cb36-4430-ba9f-db20b928f75f-20230805163747.hl7
  2. Ready 8.2KB ready/development.DEV_ELIMS/oru_r01-base-6d376a04-6135-4561-82b6-7462d2163b39-20230805163901.hl7 7.9KB ready/development.DEV_ELIMS/oru_r01-base-3c03a4a8-d35c-464e-b169-71a27f7ad3d4-20230805163801.hl7
  3. Receive 8.5KB receive/ignore.ignore-elr-elims/None-2e8425d6-5f17-44af-b5ce-a60371e0d2e4-20230805163803.hl7 8.4KB receive/ignore.ignore-elr-elims/None-762dd50d-c647-4924-b0b4-a531e77319c9-20230805163724.hl7
  4. Route 56KB route/ignore.ignore-elr-elims/None-a01cccb4-5799-494e-8813-0d9fdb7f8645-20230805163742.fhir 55KB route/ignore.ignore-elr-elims/None-af3dbda7-be3a-4b5e-bbf9-095e32c19f6a-20230805163805.fhir
  5. Translate 56KB translate/ignore.ignore-elr-elims/None-07820c36-78b3-4504-ba15-6ed18af83946-20230805163806.fhir 56KB translate/ignore.ignore-elr-elims/None-fe61f913-9620-4c95-a7b6-1d76d2458b34-20230805163744.fhir

Estimated sizing formular:

For a inbound HL7 message of 9KB, the intermediate data in Azure Storage: 9KB X 3 + 56KB X 2 = 139KB

Roughly a 1:14 ratio

Azure Queues & Tables: Data is ephemeral in queues and tables, records are cleaned up when processing complete

Risks/Impacts/Considerations

NA

Dev Notes

Ask DevOp for current practice

Acceptance Criteria

A document created containing RS UP data retention policies

JFU-NAVA-PBC commented 1 year ago

@arnejduranovic @avnieldravid logged a ticket for data retention and storage sizing analysis