Add Jupyter notebooks to pull in and start exploring CMS data

mattgawarecki commented 7 years ago

This PR gives us a few things:

Two new Jupyter notebooks: part-d_convert_to_feather and part-d_exploration
- convert_to_feather: downloads CMS data for Medicare Part D spending, extracts it, and saves the data in Feather format under drugnames.feather and spending-{year}.feather
- exploration: does some very basic exploratory work on the downloaded data set and shows how to read files in feather format with Python
Datasets for Medicare Part D spending 2011-2015, in feather format
- drugnames.feather: a list of all the drugs in the data set; corresponds row-wise to the spending-{year}.feather files
- spending-{year}.feather: spending data for Medicare Part D by year; corresponds row-wise to drugnames.feather

zachmueller commented 7 years ago

Thanks for the extra .gitignore commit there lol.

Would you mind updating the notebook code and re-running to have the data moved into a subfolder (data/ perhaps) so we avoid having too much stuff in the top-level folder? Long-term, I think it'll be best for us to move the data out to an external storage (e.g., S3) to avoid filling up the repo itself too quickly, but I already screwed that up with my initial commit lol.

mattgawarecki commented 7 years ago

Still need to have the raw data downloaded to data/. Stand by.

zachmueller commented 7 years ago

Awesome, looks good! Let me know whether there are any other pending changes, otherwise I'll merge them in. I'll also try to ping Jonathon to see if he can get you direct access to the Data For Democracy org so you can make future commits to the main repo directly.

Data4Democracy / drug-spending

Add Jupyter notebooks to pull in and start exploring CMS data #1