Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
72 stars 46 forks source link

USP Drug Classification data dictionary + tidying #33

Closed cduvallet closed 7 years ago

cduvallet commented 7 years ago

Continuing on issue #14, finalize the USP Drug Classification data dictionary, etc. Taw and tidy data are on data.world.

This data may or may not be useful - it has non-Medicare Part D medications and their respective classes/categories. The classes and categories are pretty self-explanatory (e.g. Antidepressants, Antiparkinson Agents, Sleep Disorder Agents) and can likely easily be tied to usage (depending on how we decide to define usage...).

Some follow up tasks, if we decide to use this data:

jenniferthompson commented 7 years ago

@cduvallet! The data summary and data dictionary are SO helpful! I've asked Matt or Daniela to review it because I'm not a Python user, but whether or not the data relates to what we're doing immediately, having all this documented so well is fantastic. Thank you!

dhuppenkothen commented 7 years ago

I can review this today, unless @mattgawarecki is on it already.

dhuppenkothen commented 7 years ago

I can also check tonight if these classifications work for the Part D data that I've been playing around with.

cduvallet commented 7 years ago

@dhuppenkothen I made the changes you recommended, it's much nicer now. I wasn't sure of the best way to interface with read_data.py (so I just re-wrote the download data wrapper...)

Also, it seems that there are currently two ways we're keeping track of, downloading, and tidying data:

  1. The script/read_data.py script has individual functions for each of the datasets that downloads and tidies, and
  2. The data/ folder has individual data dictionaries and corresponding tidying scripts, one for each individual dataset.

From what I understood from @mattgawarecki, I think we're going with option 2? But let me know if not, and I can incorporate this into the read_data.py script.

jenniferthompson commented 7 years ago

@cduvallet I'll let @dhuppenkothen speak to read_data.py, but just wanted to jump in and say we had a long discussion today about repo organization, and I just submitted a PR to reflect the updated file structure. Once we get that finalized we'll clean up all the documentation, but the idea will be to have a dictionary (md) in /datadictionaries, and tidying scripts in (in your case) /python/datawrangling/[subfolders if you need it]. Not sure if that answers all your questions, but hopefully helps! Thanks so much for bearing with us while we get more streamlined - it'll help tremendously in the long run.

jenniferthompson commented 7 years ago

Hey @cduvallet and @dhuppenkothen! Just checking in on the status of this PR. No rush intended on my end, just wanted to make sure there isn't anything blocking either of you that we need to take care of administratively.

cduvallet commented 7 years ago

@jenniferthompson Nope, I was just traveling this weekend so haven't gotten around to finalizing this. Will update if I need anything from y'all! :)

cduvallet commented 7 years ago

Okay, I think we should be ready to merge! @jenniferthompson double-check and let me know if anything needs to change?

jenniferthompson commented 7 years ago

@cduvallet The data-dictionaries branch looks great! Would you mind pushing that to your master branch so it'll show up on master here? I think that should do it!

@dhuppenkothen did you have any further suggestions on the Python code?

cduvallet commented 7 years ago

@jenniferthompson I think I did it! Should be ready to merge if @dhuppenkothen doesn't have other comments.

dhuppenkothen commented 7 years ago

Looks good to me!

mattgawarecki commented 7 years ago

Oops. I'll get this into master instead of data-dictionaries.