CDCgov / data-exchange-hl7

Enterprise Data Exchange (DEX) is a new cloud-native centralized data ingestion, validation, and observation service scoped for common data types (HL7, FHIR, CDA, XML, CSV) sent to the CDC. It helps public health stakeholders who send data to the CDC while reducing the maintenance efforts, complexity, and duplication of ingestion points to CDC.
Apache License 2.0
10 stars 14 forks source link

Checksum mechanism on cache; Ability to check Redis cache is up to date as loaded with latest updated data. #571

Closed cosmycx closed 3 months ago

cosmycx commented 1 year ago

@mscaldas2012 , I think to have some form of checks to make sure the meta data is accurate is important. I think it's worth the effort to explore this in the next sprint.

--- Adding notes from Boris for reference I'm concerned about the input to Redis as a caching strategy without some form of intermediate form to be able to do quick sanity check without directly going into the database.Redis just holds our key value pairs, and they're all just text at the end of the day. When running a periodic cron job every so often to load the data, I want the operation to be as simple as possible. The outcome should be "Did the data load, or did it not load?"

  1. If the cron job we ran succeeds in loading the data, success; and it should match up with an intermediate storage that's viewable in Github. After all, the key value pairs are all text, we can store them as text in Github with the ability to diff versions coming back from MMG API and PHIN Vocab.
  2. If the cron job fails to load the data from the intermediate form from say... Github, then operations should step in to figure out why it didn't load. But the transparency should be provided by Github version + ensuring the Azure function ran. My hope for outcome is:Cron job runs --> Github as intermediate cache. Cron job runs --> pulls from Github for key value pairs --> pushes to Redis. Instead ofCron job runs and pushes to Redis; with little transparency on what was done at all.
mscaldas2012 commented 1 year ago

the whole point of loading data from PHINVADS and MMG-AT is to dynamically have latest changes available to use on pipeline - specially PhinVADS. Loading that data from source controlled, visible sources Defeats the purpose in my opinion.

mscaldas2012 commented 1 year ago

We have a Premium version with fault tolerance and it uses persistence of its data that gets reloaded when Redis comes back online. Therefore this is no longer needed.

rmharrison commented 3 months ago

Stale issue.

MMG Validation was removed from DEX HL7v2 Validation pipeline on 9 Aug 2023 in v0.0.25

The Redis cache used to load...

Was only used for MMG validation, and was therefore removed.