Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
73 stars 46 forks source link

Create keys to join Medicare Part D spending, manufacturer, and lobbying info #37

Closed jenniferthompson closed 6 years ago

jenniferthompson commented 7 years ago

We need to be able to join related datasets (stored at data.world) that currently don't have keys in common. Prime candidates currently include:

drugdata_clean.csv Pharma_Lobby.csv all spending-201x.csvs

dhuppenkothen commented 7 years ago

Maybe also RxNorm and the CMS drug class data?

skirmer commented 7 years ago

The keying of companies is finished, I submitted PR #41 with the script and will add the dfs to data.world somehow. The keying of drugs I could work on later this week, depending on how busy things get, but if someone else wants to that is also great.

mattgawarecki commented 7 years ago

Looks like we might be done with this issue! :grin: Any objections or concerns, @jenniferthompson @dhuppenkothen @skirmer ?

jenniferthompson commented 7 years ago

My impression was that we still need one more key - but @skirmer would know best. I don't think anyone else took her up on the suggestion of working on it ;)

jenniferthompson commented 7 years ago

@skirmer Just confirming - we still need some work on this issue to get the drug keys, correct?

skirmer commented 7 years ago

@jenniferthompson Yes, and while I started a little bit of a stub to work on probabilistic matching between the drug names across datasets, other stuff in life has been taking over and I haven't had time to finish it. The key is that our drug spending files and the manufacturer file have very different labelings of drugs, generic, brand, etc, and we need to decide on a schema to match these and assign a unique key to each drug.

skirmer commented 7 years ago

BTW, so sorry for my delayed response! I didn't get any notifications about this thread for some reason.

mattgawarecki commented 7 years ago

@skirmer How's it going? I see you mentioned the need for a schema to be able to uniquely identify drugs; would this be good to talk about in Slack and then update this ticket once we figure it out? Maybe we can schedule some time for anyone interested to talk over Slack, Hangouts, or similar. Thoughts?

darya-akimova commented 6 years ago

The lobbying aspect of the project is currently on hold. Will close this issue and reopen if this seems to be an interesting avenue in the future.