National-COVID-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

CMS - Medicare and Medicaid : Need to update the provider characterization files using the newer version of the Compendium file. #121

Closed stephanieshong closed 10 months ago

stephanieshong commented 11 months ago

Compendium file that we are aiming to have incorporated into the N3C data resources. It is a list of high-level characteristics for 635 systems, ( Lok's team had few concerns.) Of the variables listed in the table below: • 11 are indicators, • 6 summarize original Compendium numeric variables into quintiles, and • 2 summarize original Compendium numeric variables into 4 categories.

There is sufficient variation within each variable to avoid identifying particular systems. But with this many variables, there is a good chance that researchers could identify systems if they were to reference the full Compendium. Leanne and Lok will prioritize the list to balance the usefulness of each variable against its contribution to re-identifying individual systems.

stephanieshong commented 10 months ago

link send to Lok for review.

stephanieshong commented 10 months ago

merged to master. Sent the dataset path and link to Lok and Leeanne.

stephanieshong commented 10 months ago

used the newest compendium file from September 2023. the path is left in the step4, but may want to move it to the characterization folder. https://unite.nih.gov/workspace/data-integration/dataset/preview/ri.foundry.main.dataset.432045c6-eff3-4b64-90cd-98fd5ab7b10b/master