Part B Spend Clean - Githubissues

Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.

72 stars 46 forks source link

Part B Spend Clean #20

Closed sgalletta213 closed 7 years ago

sgalletta213 commented 7 years ago

This commit contains the RProject I made to clean the raw part B spend .xlsx file from CMS. It contains: 1) the RProject file 2) script 3) clean data file 4) data definition file 5) a .gitignore for R specific files and to override the higher-level .gitignore that was omitting my .csvs 6) an extra .gitignore in /data to preserve the file hierarchy that the script looks for

Note: This may be overkill for a data cleaning task. If so, let me know, and in the future I'll commit something lighter :)

jenniferthompson commented 7 years ago

@sgalletta213 Wow - awesome! I'm working on some other tasks tonight but will check it out ASAP. In general - I think an R script + final CSV would be plenty, but I'm also a big fan of thoroughness. :) Thank you!

jenniferthompson commented 7 years ago

@sgalletta213 This looks awesome!! Thank you so much for doing it! The only thing I see - do you think it would be straightforward to remove the trailing white space from the drug names within your script? That would be helpful when trying to merge with other data sources, I think.

sgalletta213 commented 7 years ago

Closing this pull request to remove whitespace and fix data_definitions

jenniferthompson commented 7 years ago

Awesome!