coderxio / medication-diversification

More realistic synthetic medication data.
Other
12 stars 3 forks source link

Add more MEPS versions #90

Open kristentaytok opened 2 years ago

kristentaytok commented 2 years ago

Problem Statement

MEPS is an annual survey dating from 1996 to (now) 2020, and utilization patterns change over time (e.g., as new drugs become available). For our data challenge submission, we loaded MEPS 2018 data. By loading MEPS data from other years and enabling our tool to create distributions for each MEPS version, we can improve the data quality of the MDT and set the Synthea module to select the distribution based on the medication order year (e.g., when Synthea creates a med order for 2019, it can pull the MDT distribution from MEPS 2019).

Criteria for Success

Additional Information

MEPS data files: https://meps.ahrq.gov/data_stats/download_data_files_results.jsp?cboDataYear=All&cboDataTypeY=2%2CHousehold+Event+File&buttonYearandDataType=Search&cboPufNumber=All&SearchTitle=Prescribed+Medicines https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_results.jsp?cboDataYear=All&cboDataTypeY=1%2CHousehold+Full+Year+File&buttonYearandDataType=Search&cboPufNumber=All&SearchTitle=Population+Characteristics Current versions/code used in MDT database: https://github.com/coderxio/medication-diversification/blob/main/src/mdt/database.py

At the time we built/submitted the MDT for the data challenge, 2018 was the latest version available and the file formats were difficult to work with in python (.dat). Shortly after submission, MEPS added 2019 data and started making csv and xls file formats available (last time I checked, MEPS only added these formats for 2018 forward). So the final solution may need to account for these differences based on which files we load & which formats are available.