Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
72 stars 46 forks source link

Data Wrangling for Drug Use Visualizations #24

Closed dhuppenkothen closed 7 years ago

dhuppenkothen commented 7 years ago

While exploring the data for issue #15, I realized quickly that we actually need to be able to sort the drugs into categories. After a bit of reading, I found the CMS prescription drug profiles useful, in particular because they use a categorization from the Veteran's Affairs National Drug File. The latter classifies drugs by effect, both broadly and more finely, so that should do well for a first attempt at showing different drug uses.

This notebook describes the data sources, download and cleaning I went through to get an extended version of the drugnames.feather file described in this notebook. The new file includes columns for drug use classes, and can now be used for visualization.

There are a few hacks in there that we might want to deal with more cleanly in the long run, but I don't have a good enough understanding yet for the data to do that.

Comments and suggestions welcome. :)

mattgawarecki commented 7 years ago

I started reviewing this tonight but I'll need more time to really give it the attention it deserves. Overall I can definitely say it spits out good data that's insanely useful; I'm mostly going over the finer points of some of the more complex parsing logic, making sure it's as efficient as possible.

Great job, @dhuppenkothen! If possible, hit me up on Slack later so we can really get this over the finish line. 🏁

jenniferthompson commented 7 years ago

Oh goodness. This could be amazing. I'll leave the Python details to y'all, but that end result would be fantastic!

dhuppenkothen commented 7 years ago

It is worth noting that the prescription drug profiles have more information that might be really useful. I threw most of it out for the purposes of this project, since I was only interested in the classes, but we should keep it in mind for the future.

dhuppenkothen commented 7 years ago

Updated notebook with results from code review.