CDCgov / IDWA

Intelligent Data Workflow Automation
Apache License 2.0
1 stars 1 forks source link

SPIKE: LAC eCR Exploratory Document Comparison Analysis #88

Open bryanbritten opened 2 months ago

bryanbritten commented 2 months ago

Topic Exploratory Data Analysis (EDA) of de-identified LA County (LAC) eCR data focusing primarily on identifying differences & similarities between 2+ eCR docs.

Background We know through second-hand research that the VIPER sub-team on DIBBs has spoken with Idaho and identified that when surveillance systems start to ingest eCRs, the verbosity and volume of the data quickly begin to bog down the system. Additionally, through research DIBBs has conducted, we have learned that epidemiologists spend an inordinate amount of time manually analyzing eCRs to determine why it was sent, who sent it, and what information, if any, is different from information already known. Both of these issues imply the need for reducing the amount of data that is ingested in to a case surveillance system, and thus implying the need for understanding both what information in an eCR is pertinent for case investigation, and what it looks like when that data is duplicated or redundant.

Hypothesis

Problem Hypothesis:

  1. We believe [Epidemiologists engage in a time-consuming, manual process of comparing eCRs to one another] based on [user stories that have been gathered during DIBBs user interviews].
  2. We believe [STLTs, regardless of which data ingestion and/or case surveillance software they use, experience non-trivial declines in performance of those systems] based on [information the VIPER team received from Idaho (which uses NBS), as well as the interest in a partnership that we've seen from Chicago and Dallas (both of whom use Salesforce)].

Solution Hypothesis: By [identifying the sections of an eCR that contain pertinent information for the purposes of case investigations and the criteria by which a section of data can be determined to be duplicative or redundant], we believe [epidemiologists will spend less time manually comparing eCRs], allowing [them to spend more time doing the things they do best].

Objective

Questions Progress towards reducing uncertainty here will look like answering these questions:

bryanbritten commented 2 months ago

cc @eileenruberto - Can you please review this and let me know what your thoughts are regarding the content, especially around its brevity and clarity. Thank you!

eileenruberto commented 2 months ago

@bryanbritten This looks fantastic! Do you know if the dataset from LAC includes ELRs as well, or just eCR data? In some of the past research I've been digging into, I'm starting to see a lot about the relationship between ELRs and eCRs in case investigations for diseases that involve lab reports. Specifically, the importance of linking both of these data types to the case they relate to.