biothings / mychem.info

MyChem.info: A BioThings API for chemical/drug annotations
http://mychem.info
Apache License 2.0
16 stars 14 forks source link

Data source: OpenFDA #73

Closed newgene closed 3 years ago

newgene commented 5 years ago

EDIT: 2021-04-09: https://open.fda.gov/data/downloads/ https://open.fda.gov/tools/downloads/

Probably just "Human Drug" section.

newgene commented 5 years ago

It contains the "Drug Adverse Events".

namespacestd0 commented 5 years ago

Update: The new link is: https://open.fda.gov/downloads/ File list: https://api.fda.gov/download.json

namespacestd0 commented 5 years ago

NDC Directory:

https://open.fda.gov/apis/drug/ndc/

Each NDC Directory entry consists of two major sections:

Product data: General information about the product. Packaging information: The specific details of the product packaging. An openfda section: An annotation with additional product identifiers, such as NUII and UPC, of the drug product, if available.

Product Labeling:

https://open.fda.gov/apis/drug/label/

Each SPL report consists of these major sections:

Standard SPL fields, including unique identifiers. Product-specific fields, the order and contents of which are unique to each product. An openfda section: An annotation with additional product identifiers, such as UPC and brand name, of the drug products listed in the labeling.

andrewsu commented 3 years ago

TLDR: out of all data files on openFDA, just load pharm_class info (which is sometimes split between pharm_class_moa and pharm_class_epc) from the National Drug Code Directory file (https://download.open.fda.gov/drug/ndc/drug-ndc-0001-of-0001.json.zip)

More details below...

NDC files

example record:

    {
      "product_ndc": "67877-634",
      "generic_name": "imatinib mesylate",
      "labeler_name": "Ascend Laboratories, LLC",
      "brand_name": "Imatinib mesylate",
      "active_ingredients": [
        {
          "name": "IMATINIB MESYLATE",
          "strength": "400 mg/1"
        }
      ],
      "finished": true,
      "packaging": [
        {
          "package_ndc": "67877-634-30",
          "description": "30 TABLET, FILM COATED in 1 BOTTLE (67877-634-30)",
          "marketing_start_date": "20190201",
          "sample": false
        }
      ],
      "listing_expiration_date": "20211231",
      "openfda": {
        "manufacturer_name": [
          "Ascend Laboratories, LLC"
        ],
        "rxcui": [
          "403878",
          "403879"
        ],
        "spl_set_id": [
          "7579620e-3748-46fa-8295-f9b47d0aa5b8"
        ],
        "is_original_packager": [
          true
        ],
        "upc": [
          "0367877634302"
        ],
        "unii": [
          "8A1O1M485B"
        ]
      },
      "marketing_category": "ANDA",
      "dosage_form": "TABLET, FILM COATED",
      "spl_id": "7579620e-3748-46fa-8295-f9b47d0aa5b8",
      "product_type": "HUMAN PRESCRIPTION DRUG",
      "route": [
        "ORAL"
      ],
      "marketing_start_date": "20190201",
      "product_id": "67877-634_7579620e-3748-46fa-8295-f9b47d0aa5b8",
      "application_number": "ANDA208302",
      "brand_name_base": "Imatinib mesylate",
      "pharm_class": [
        "Kinase Inhibitor [EPC]",
        "Protein Kinase Inhibitors [MoA]"
      ]
    },

Looks like we have all the most relevant identifiers (unii, rxcui) already. Perhaps the most relevant unique data would be the pharm_class info (which is sometimes split between pharm_class_moa and pharm_class_epc), which gives some relevant drug groupings. The top 20 most used drug categories are here:

$ egrep '\[MoA\]|\[EPC\]' drug-ndc-0001-of-0001.json | sed 's/^ *//' | sort | uniq -c | sort -k1nr  | head -20
   3715 "Nonsteroidal Anti-inflammatory Drug [EPC]"
   3325 "Corticosteroid Hormone Receptor Agonists [MoA]"
   3010 "Cyclooxygenase Inhibitors [MoA]"
   2058 "Atypical Antipsychotic [EPC]"
   1948 "Corticosteroid [EPC]",
   1945 "Central Nervous System Stimulant [EPC]",
   1783 "Cyclooxygenase Inhibitors [MoA]",
   1737 "beta-Adrenergic Blocker [EPC]"
   1468 "Corticosteroid [EPC]"
   1432 "Hydroxymethylglutaryl-CoA Reductase Inhibitors [MoA]"
   1393 "Serotonin Uptake Inhibitors [MoA]"
   1358 "Cytochrome P450 3A4 Inhibitors [MoA]",
   1317 "Opioid Agonist [EPC]"
   1252 "Angiotensin 2 Receptor Blocker [EPC]"
   1250 "Anti-epileptic Agent [EPC]",
   1231 "Full Opioid Agonists [MoA]",
   1229 "Non-Standardized Pollen Allergenic Extract [EPC]",
   1194 "Angiotensin 2 Receptor Antagonists [MoA]",
   1185 "Cytochrome P450 2C19 Inhibitors [MoA]"
   1171 "Calcium Channel Antagonists [MoA]",

Product labeling

The data in these files are mostly unstructured -- just free text organized by top-level keys that correspond to sections in the drug label. For now, absent a compelling use case, I think we should ignore these files...

andrewsu commented 3 years ago

It looks like the information on pharm_class is already being imported through our NDC parser. For example: http://mychem.info/v1/query?q=drugbank.name:imatinib&fields=ndc (screenshot below). Until/unless we find more uniquely useful info in openFDA, closing this issue...

image