demand-driven-open-data / ddod-intake

"DDOD Intake" tracks DDOD Use Cases using GitHub issues. View the main DDOD site here
http://ddod.us
28 stars 11 forks source link

FDA: Purple Book Data #54

Open chackoge opened 8 years ago

chackoge commented 8 years ago

The FDA Orange Book provides downloadable data on drugs. The FDA Purple Book is available only as a spreadsheet in PDF format and has less data. Providing a machine readable version that is updated monthly and contains at least as much data as the Orange Book including listings of relevant patents would greatly support the kinds of valuable studies exemplified by Williams et al. (2015), Cell.

dportnoy commented 8 years ago

@chackoge, thanks for the request! Could you provide specifics on the fields needed. Orange Book lists the fields for each of their 3 machine-readable files: http://www.fda.gov/Drugs/InformationOnDrugs/ucm129689.htm. Which ones would be particularly applicable for your request for Purple Book?

For those unfamiliar with the Purple Book... It's the Orange Book equivalent for biologics. The descriptive name for the Purple Book is “Lists of Licensed Biological Products with Reference Product Exclusivity and Biosimilarity or Interchangeability Evaluations.”

To elaborate on the use case from the cited paper...

"We propose that data mining and network analysis utilizing public databases can identify and quantify relationships between scientific discoveries and major advances in medicine (cures). Further development of such approaches could help to increase public understanding and governmental support for life science research and could enhance decision making in the quest for cures."

Unfortunately, this article isn't publicly accessible. Do you know what open data sources were used? Any other relevant factors we may want to point out here?

dportnoy commented 8 years ago

BTW, the cited "From Scientific Discovery to Cures" paper was also mentioned in National Institute of General Medical Sciences (NIGMS) 2017 budget...

Big Data approaches are not merely useful for answering scientific questions; they can also contribute to an understanding and optimization of the scientific process itself. A recent paper by a group of scientists in California used data mining and network analysis of public databases to follow connections between basic research and major advances in medicine. This “cure network informatics” approach used quantitative, analytical methods to trace medical advances back to the broad, diverse foundations on which those advances were built, including scientists in very different fields and very distant locations. This research provides an evidence base that emphasizes the importance of a central goal of NIGMS’ strategic plan: maintaining a diverse portfolio of research, researchers, and institutions.

chackoge commented 8 years ago

@dportnoy I would say all three files are relevant. I am most interested in the product, active ingredient, and patent numbers associated with these biologics but I expect that the larger community would like to see all the data in the Orange Book downloadable files made equivalently available for the Purple Book. A step further would be to retain historical patent data, i.e., list patents even if they've expired. This may not be as relevant to the Purple Book since biologics are relatively new but is certainly desirable for the Orange Book data.