Closed darya-akimova closed 6 years ago
I just completed this task for one dataset.. spending_2011. How can I share the dataset? just wanted to get a comment before doing this for all dataset.
When CMS overhauls its Electronic Health Record (EHR) how will it affect supplemental health insurance costs & Medicare Open Enrollment 2019?
Status
drug_uses.csv
(on data.world) by thedrugname_generic
Update:
drug_uses.csv
is missing drugs that should be in the ATC system (and are present in theatc_codes_clean.csv
dataset). It seems that theatc_codes_clean.csv
is the way to go, even though it's currently in a messier state.Update 3/1/2018
Matched app. 3k out of 4.5k items in the Medicare Part D dataset (see /drug-spending/R/datawrangling/atc_merge_atc_codes_clean_da and /atc_merge_drug_uses_csv_da for notebook), but putting this issue on hold and switching directions because:
Task
Join drugs in the Medicare Part D spending data to their ATC Classification System categories by join the
spending-201x.csv
files on data.world to eitheratc_codes_clean.csv
ordrug_uses.csv
.What we're looking for
Potential outputs that would help greatly to further the goal of this project:
The potential routes of matching drugs to their ATC classification categories:
atc_codes_clean.csv
datasetdrugname_generic
column from any of thespending-201x.csv
files on data.world to theatc_codes_clean.csv
file on data.world by thelevel5
orkegg
columns. But these columns are very messy in theatc_codes_clean.csv
file. To be honest I didn't put in much effort into cleaning them because I wasn't sure how these columns would be used down the line.ORdrug_uses.csv
dataset (which may be the easier of the two)Two options:
drugname_brand
column in any of thespending-201x.csv
datasets to thedrugname_brand
column indrug_uses.csv
. ORdrugname_generic
column from any of thespending-201x.csv
files on data.world to thedrugname_generic
orsubstance
orname
columns indrug_uses.csv
. These columns in thedrug_uses.csv
dataset are much cleaner than theatc_codes_clean.csv
and I only realized that this file contained ATC classification information after I uploaded theatc_codes_clean.csv
.Side note: all of the
spending-201x.csv
files should have the same drugs in the same format, since they come from a parent wide .xlsx file that had the years data spread across the columns. If the drugs from one of the spending files are matched successfully, then the same steps should successfully join the other files, so no need to try and compare all of the spending files to find differences in drug names.Ideal file formats for the analysis:
How this will help
We're working towards matching drugs to their therapeutic uses. The USP Classification system, FDA drug approval data, and the ATC Classification system seem to be the best potential grouping categories of drugs from the datasets that have already been collected. The ATC classifications system provides classification seems to be more scientific/medical jargon-leaning, but it can be potentially useful down the line.