biothings / mydisease.info

7 stars 8 forks source link

Load data from FDA Orphan Drug Designations and Approvals #41

Closed andrewsu closed 10 months ago

andrewsu commented 3 years ago

FDA maintains a database of Orphan Drug Designations and Approvals (which currently has 5851 entries) at https://www.accessdata.fda.gov/scripts/opdlisting/oopd/. Some cleaning and normalization of both diseases and drugs will be required, so perhaps not the simplest data source to ingest...

This is the inverse issue of https://github.com/biothings/mychem.info/issues/96

r76941156 commented 3 years ago

The issue has been addressed and the detail can be found at: https://github.com/r76941156/fda_orphan_drug/blob/main/FDA_orphan_drug_demo.pdf

Address with request: https://github.com/biothings/mydisease.info/pull/42

andrewsu commented 2 years ago

@r76941156 understanding that the data cleaning and normalization process (that you depicted in the slide below) was only semi-automated, can you add the code for the automated parts of this process to this repo please? I'm guessing you did some basic text parsing of complex phrases (e.g. "Treatment of <disease name>" followed by matching of <disease name> against some ontology/vocabulary of diseases)? And I'm guessing that many (most?) records followed a pattern like this, but that some percentage of records required manual review?

image

andrewsu commented 10 months ago

I see the parser in the data_tool subdirectory. Also forked the parser to the biothings org. Closing this issue as complete