Closed jenniferthompson closed 6 years ago
Maybe also RxNorm and the CMS drug class data?
The keying of companies is finished, I submitted PR #41 with the script and will add the dfs to data.world somehow. The keying of drugs I could work on later this week, depending on how busy things get, but if someone else wants to that is also great.
Looks like we might be done with this issue! :grin: Any objections or concerns, @jenniferthompson @dhuppenkothen @skirmer ?
My impression was that we still need one more key - but @skirmer would know best. I don't think anyone else took her up on the suggestion of working on it ;)
@skirmer Just confirming - we still need some work on this issue to get the drug keys, correct?
@jenniferthompson Yes, and while I started a little bit of a stub to work on probabilistic matching between the drug names across datasets, other stuff in life has been taking over and I haven't had time to finish it. The key is that our drug spending files and the manufacturer file have very different labelings of drugs, generic, brand, etc, and we need to decide on a schema to match these and assign a unique key to each drug.
BTW, so sorry for my delayed response! I didn't get any notifications about this thread for some reason.
@skirmer How's it going? I see you mentioned the need for a schema to be able to uniquely identify drugs; would this be good to talk about in Slack and then update this ticket once we figure it out? Maybe we can schedule some time for anyone interested to talk over Slack, Hangouts, or similar. Thoughts?
The lobbying aspect of the project is currently on hold. Will close this issue and reopen if this seems to be an interesting avenue in the future.
We need to be able to join related datasets (stored at data.world) that currently don't have keys in common. Prime candidates currently include:
drugdata_clean.csv Pharma_Lobby.csv all
spending-201x.csv
s