Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
73 stars 46 forks source link

Tidy drug_list.json #71

Closed darya-akimova closed 6 years ago

darya-akimova commented 6 years ago

Status

Assigning this to myself. Currently working on formatting the nested list of therapeutic areas into a workable format.

Task

Tidy and/or possibly explore the drug_list.json dataset, found on data.world
Data dictionary: https://github.com/Data4Democracy/drug-spending/blob/master/datadictionaries/drug_list.md Tidy format reference: https://ramnathv.github.io/pycon2014-r/explore/tidy.html

What we're looking for

Tidying:

Other:

How this will help

The drug_list.json and the usp_drug_classification.csv files seem to include the most accessible drug category information, as in, the classification systems lean more towards therapeutic classification, rather than scientific/pharmacological like some of the others. However, the drug_list.json needs some tidying to convert it into a more user-friendly format. Another issue with this dataset is that the specific_treatment column will need some language processing in order to make this column usable. Need to know if the work will be worth it, hence need to know how many of the drugs from this file are in the Medicare spending files.

darya-akimova commented 6 years ago

After some exploration, realized that this dataset only goes back to 1995 and does not include generics, so it is too limited for the purposes of this project. Will close and abandon this issue.