Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
72 stars 46 forks source link

Associate drugs with their therapeutic uses #6

Closed mattgawarecki closed 6 years ago

mattgawarecki commented 7 years ago

Task

We currently have a listing of drug names (both brand and generic) and a separate list of ATC codes. We'd like to find a way to associate these two data sets in such a way that we can look up a given drug's therapeutic uses using its name.

How this will help

If we can establish a link between drug names and their uses, we'll be able to learn a ton about which diseases and conditions Medicare is spending money to treat. Among other things, this also opens the door to comparing cost and popularity of different drugs within the same class over time.

Things you need to know

Based on prior efforts, this task may take some significant effort to complete. Drugs often go by various names (even chemically/generically), so doing a simple text search may not be viable. Expect to have to deal with lots of exceptions and edge cases. We may even need to acquire more comprehensive data, which could require you to solicit other organizations.

jenniferthompson commented 7 years ago

Might should be a separate issue, but: inspired by benzodiazepine conversation on Slack - thinking about adding "a. therapeutic uses and b. classes"? Eg, I might be interested in all cardiology meds or specifically statins.

jenniferthompson commented 7 years ago

Thinking a first step on this could be to separate generics from brand names in drug_list.json. Some are pretty easy (Humira (adalimumab)), some are quite a bit trickier.

acutrell commented 7 years ago

I don't want to send you down a rabbit hole, but has anyone explored RxNorm? It may provide a link. The National Drug Codes in the Medicare file are linked to the First Databank Medknowledge proprietary database. It is one of the source vocabularies (SAB table)) used by RxNorm. ATC is another SAB. Just a thought. https://www.nlm.nih.gov/research/umls/rxnorm/

mattgawarecki commented 7 years ago

@acutrell Thanks for the heads up!

All: Because we've gotten quite a few more leads into drug use data sets, I'm going to create a separate issue for it. I think we've discovered a new need -- consolidating all these sources into a list for later curation and exploration.

pabramowitsch commented 7 years ago

This Dx to to drug information is commercially available from the company First Databank which is owned by Hearst Business Media. I work for a sister group within Hearst and am very familiar with FDB's information model. It includes not only therapeutic uses, but a huge amount of ancillary info about cost, packaging, dosing by age/sex, generics, and much more.

jenniferthompson commented 7 years ago

@pabramowitsch Oh wow, that would be really helpful. Do you have a link or know how much it would cost/how open they'd be to working with nonprofits?

pabramowitsch commented 7 years ago

I think a simple license is in the range of 50K per year, but I can ask how it might work for a non profit.  They are obviously very protective of their data sets as there is a huge editorial effort behind them (20+ pharmacists and clinicians in addition to IT staff).  There would need to be some assurance that it didn't leak into other uses.

Peter

  From: Jennifer Thompson <notifications@github.com>

To: Data4Democracy/drug-spending drug-spending@noreply.github.com Cc: Peter Abramowitsch pabramowitsch@yahoo.com; Mention mention@noreply.github.com Sent: Monday, February 6, 2017 8:22 AM Subject: Re: [Data4Democracy/drug-spending] Associate drugs with their therapeutic uses (#6)

@pabramowitsch Oh wow, that would be really helpful. Do you have a link or know how much it would cost/how open they'd be to working with nonprofits?— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jenniferthompson commented 7 years ago

@pabramowitsch That definitely makes sense from a business perspective... not sure it would mesh well with the current D4D model of keeping most everything open. Which is a bummer!

amichp commented 7 years ago

Just heard this project exists on "partially derivative" podcast, and am looking around.

I suggest you might want to consider using this repository, and the data sources he uses, to deal with this issue:

https://github.com/fabkury/ndc_map

dbuijs commented 7 years ago

Have a look at drugbank.ca, also Rxnorm (https://mor.nlm.nih.gov/RxNav/), also https://open.fda.gov/drug/label/

jenniferthompson commented 7 years ago

Thanks @dbuijs! We've got an issue open for drugbank.ca (#61) if you're interested.

darya-akimova commented 6 years ago

Closing this issue in order to break it down into smaller issues to reduce the scope of the project.