Closed andrewsu closed 10 months ago
The issue has been addressed and the detail can be found at: https://github.com/r76941156/fda_orphan_drug/blob/main/FDA_orphan_drug_demo.pdf
Address with request: https://github.com/biothings/mydisease.info/pull/42
@r76941156 understanding that the data cleaning and normalization process (that you depicted in the slide below) was only semi-automated, can you add the code for the automated parts of this process to this repo please? I'm guessing you did some basic text parsing of complex phrases (e.g. "Treatment of <disease name>
" followed by matching of <disease name>
against some ontology/vocabulary of diseases)? And I'm guessing that many (most?) records followed a pattern like this, but that some percentage of records required manual review?
I see the parser in the data_tool
subdirectory. Also forked the parser to the biothings org. Closing this issue as complete
FDA maintains a database of Orphan Drug Designations and Approvals (which currently has 5851 entries) at https://www.accessdata.fda.gov/scripts/opdlisting/oopd/. Some cleaning and normalization of both diseases and drugs will be required, so perhaps not the simplest data source to ingest...
This is the inverse issue of https://github.com/biothings/mychem.info/issues/96