Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
73 stars 46 forks source link

issue #76 - Join Medicare Part D spending data to ATC Classification System #78

Closed Anandkarthick closed 6 years ago

Anandkarthick commented 6 years ago

Output files and Jupyter notebook for Issue 76..

darya-akimova commented 6 years ago

Hey Anandkarthick, Thanks for all your work, but after reviewing the repo, there's a number of issues that need to be resolved before it can be merged:

If you'd like to keep working on this, I have a few recommendations. After realizing there's drugs missing from each of the year files, I've created and imported spending_part_d_2011to2015_tidy.csv onto data.world. Please set yourself up to use the data.world library for python ( https://data.world/integrations/python ) and then use SQL to pull down only the name columns from either spending_part_d_2011to2015_tidy.csv and then select only the unique rows or the original .xlsx file Medicare_Drug_Spending_PartD_All_Drugs_YTD_2015_12_06_2016.xlsx. Working with only these columns should keep all output files down in size. I will also change the issue to make this recommendation.

Anandkarthick commented 6 years ago

Thank you for the comments and recommendation.

spending_data_merged.csv - I think it makes sense to remove this.. I overlooked the data sets in data.world and created this merged file as part of the effort. I'll remove this file.

spending-2011.csv & spending_data_match.csv.zip - this is my mistake.. I used only generic name to create a match. I just completed python integration with data.world. I'll modify the notebook for spending_2011.csv dataset and create a pull request again.

darya-akimova commented 6 years ago

Great, thanks again for your continued effort on this project! Please feel free to ask me any questions as you go along.

darya-akimova commented 6 years ago

I'm going to close this pull request without merging because there doesn't seem to have been progress and the ATC merging is on hold. If there's more progress on this, please feel free to submit another pull request in the future.