chorus-ai / Chorus_SOP

ChoRUS centralized SOP documentation site
https://chorus-ai.github.io/Chorus_SOP/
Apache License 2.0
1 stars 1 forks source link

Review of SOP for Data Quality Approaches within CHoRUS #27

Open jshoughtaling opened 1 month ago

jshoughtaling commented 1 month ago

This PR proposes to add two SOP documents to the validated SOP repository:

  1. Evaluating-Quality-At-Sites.mdx
  2. Evaluating-Quality-Centrally.mdx

These two documents represent the two core approaches to evaluating and updating data extracts based on their quality and characterization within CHoRUS. Sites have been asked to execute the standard OHDSI quality and characterization packages locally, and the Standards Team has held multiple office hour sessions describing these processes and how to execute them effectively. Moreover, the cloud team is busy ingesting data extracts delivered to the MGH central cloud; in addition to producing a processed dataset, these ingestion processes will produce a structured report to return to data sites.

AEW0330 commented 3 weeks ago

Jared, this is great! Some suggestions.
For the Purpose section: Maybe add the sentence: "This SOP defines the multistep process used in CHoRUS Bridge2AI to ensure that released versions of the data are of sufficient accuracy, completeness, and consistency for the development of AI/ML algorithms that help improve recovery from acute illness."

A figure we might use to define the main elements of the process image The steps you describe could refer to their place in the figure.

For an Audience section: This SOP is meant to inform anyone who seeks to understand the process used to assure the quality of the CHoRUS Bridge2AI dataset and to guide anyone involved in that process about the required resources and activities.

For a Scope section This SOP is exclusively focused on a high-level description of the data quality process and related activities. Details are provided by links to other documents. It is not focused on the Relevance of the data (prioritizing data elements to include in the dataset) or with assessing coverage of prioritized data elements. But there is overlap between Relevance and the relevance of the data and the data feedback report does include a combination of data quality and coverage of prioritized data elements.

jshoughtaling commented 1 week ago

@AEW0330 - thanks for the feedback! I implemented your suggestions in the documents in this commit.