coderxio / medication-diversification

More realistic synthetic medication data.
Other
12 stars 3 forks source link

Add validation_df_output CSV file to log folder #83

Closed kristentaytok closed 3 years ago

kristentaytok commented 3 years ago

Fixes coderxio/medication-diversification#ISSUE NUMBER

Explanation

Added a validation_df CSV file, which merges the MDT ingredient distribution dfs with their respective product distribution dfs & calculates % ingredient distribution * % product distribution from MEPS. The output is a CSV file, which contains a column 'validation_percent_product_patients' that can be used for downstream validation of Synthea+MDT outputs.

Rationale

To make the validation reproducible, we added these steps to generate a validation_df CSV file -- which can be used as inputs in downstream chi-square calculations that compare MEPS product-level distributions to Synthea+MDT patient population product-level distributions.

Tests

  1. Successful run with our asthma_maintenance settings files -- where age_ranges = 0-5 and 6-103 --> SUM of validation_percent_product_patients WITHIN age groups ~1 (100%) as expected, though some slight variation within 1% caused by rounding.
  2. Successful run with our asthma_maintenance settings files -- where age_ranges = 0-5 and 6-103 & state = true --> SUM of validation_percent_product_patients WITHIN age group-state pairs ~1 (100%) as expected, some slight variation within 1% caused by rounding.
testing logs ``` # Settings for the Synthea module module: name: # (optional) string, defaults to the camelcase name of the module folder assign_to_attribute: # (optional) string, defaults to the lowercase name of the module folder reason: asthma_condition # (optional) string, references a previous ConditionOnset state as_needed: false # boolean, whether the prescription is as needed chronic: true # boolean, whether the prescription is chronic refills: 0 # integer, number of refills # Settings for the RxClass search to include/exclude # *** At least one RxClass include or RXCUI include is required *** # NOTE: you can include/exclude multiple class_id/relationship pairs # RxClass options - see https://mor.nlm.nih.gov/RxClass/ rxclass: include: - class_id: R01AD relationship: ATC exclude: # - class_id: # relationship: # Settings for individual RXCUIs to include/exclude # *** At least one RxClass include or RXCUI include is required *** # NOTE: you can include/exclude multiple RXCUIs # You must enclose RXCUIs in quotes - example: '435' # RXCUI options - see the Ingredient section in https://mor.nlm.nih.gov/RxNav/ # Dose form options - see https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix3.html rxcui: include: # - exclude: # - ingredient_tty_filter: IN # (optional) string, options are IN or MIN dose_form_filter: # (optional) list, see dose form options above - Dry Powder Inhaler - Metered Dose Inhaler - Inhalation Solution - Inhalation Suspension - Inhalation Powder # Settings for the MEPS population meps: age_ranges: # (optional) list, defaults to mdt-settings.yaml default age ranges - 0-5 - 6-103 demographic_distribution_flags: age: true # boolean, whether to break up distributions by age ranges gender: false # boolean, whether to break up distributions by gender state: false # boolean, whether to break up distributions by state of residence # in 2nd test run, this was set to true ```
kristentaytok commented 3 years ago

Also the output from the first run matches the % distributions Kent shared in this thread for ages 0-5: https://coderx.slack.com/archives/C01KEGCJE3D/p1625769209009000?thread_ts=1625654227.000500&cid=C01KEGCJE3D

(and these % distributions = the product of the ingredient % pop * product % pop in Joey's remarks output: https://coderx.slack.com/archives/C01KEGCJE3D/p1625654227000500