Closed dhuppenkothen closed 7 years ago
Added @mattgawarecki as a reviewer because Python, but this sounds fantastic! Thanks, @dhuppenkothen!
This PR was getting to be quite large -- I see some small places where we could improve, but I'm not going to let "perfect" get in the way of "great" here. Merged with great pleasure! 👍
I took the code from the notebook that got merged today, along with this notebook and made a script that can be called from the command line. Of course, the functions can all also be important and called from within python.
It allows the user to decide which data to download (including all), and also includes a helper function to avoid duplicating a lot of code. Examples:
Get help:
python read_data.py -h
Download all data into a specified directory:
python read_data.py -a -d "/path/to/data/directory/"
Download Part D data only into the default directory
"../data/"
:python read_data.py --download-partd
Don't download data, but make a file that associates drug names with classes and IDs:
python read_data.py --make-drug-table
I also made a small change to the notebook referenced above, to remove the dependency on
openpyxl
, which is unnecessary given that we're importingpandas
anyway.Comments/suggestions welcome. :)