GenomicMedLab / dgipy

Python client for fast access to the Drug-Gene Interaction Database (DGIDb)
MIT License
0 stars 0 forks source link

Handle failed ANDA/NDA requests #63

Closed jsstevenson closed 3 weeks ago

jsstevenson commented 2 months ago
          Should we allow a bulk look-up to continue even if one individual ANDA/NDA request fails? I'm wondering if it would be helpful to have some kind of `skip_bad_lines=True` type option for people with big bulk queries. See below example:

Screenshot 2024-08-29 at 9 14 35 AM

_Originally posted by @mcannon068nw in https://github.com/GenomicMedLab/dgipy/pull/61#discussion_r1736186111_

jsstevenson commented 2 months ago

same thing, should we have some kind of skip_bad_lines=True type option for bulk request needs. I could see a clinical workflow with lots of data getting slowed to a stop just because of one bad request. That would probably cause a lot of headaches:

mcannon068nw commented 3 weeks ago

Commenting in to say this is currently causing headaches in preparing for the analysis pipeline.

Error:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[15], [line 1](vscode-notebook-cell:?execution_count=15&line=1)
----> [1](vscode-notebook-cell:?execution_count=15&line=1) openfda_data = dgipy.get_drug_applications(list(filtered_df['drug_name']))

File ~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:380, in get_drug_applications(terms, api_url)
    [373](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:373) for app in result["drugApplications"]:
    [374](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:374)     application_number = app["appNo"].split(".")[1].replace(":", "").upper()
    [375](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:375)     for (
    [376](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:376)         brand_name,
    [377](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:377)         marketing_status,
    [378](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:378)         dosage_form,
    [379](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:379)         dosage_strength,
--> [380](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:380)     ) in _get_openfda_data(application_number):
    [381](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:381)         output["drug_name"].append(name)
    [382](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:382)         output["drug_concept_id"].append(concept_id)

File ~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:335, in _get_openfda_data(app_no)
    [333](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:333) except requests.exceptions.RequestException as e:
    [334](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:334)     _logger.error("Request to %s failed: %s", url, e)
--> [335](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:335)     raise e
    [336](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:336) data = response.json()
    [337](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:337) return [
    [338](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:338)     (
    [339](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:339)         product["brand_name"],
...
   [1021](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/models.py:1021)     )
   [1023](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/models.py:1023) if http_error_msg:
-> [1024](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/models.py:1024)     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: Not Found for url: https://api.fda.gov/drug/drugsfda.json?search=openfda.application_number:%22ANDA078803%22
mcannon068nw commented 3 weeks ago

Code example for reproducing error:


# Search for interaction data for genes of interest
genes = ['BRAF','PDGFRA','PDGFRB']
df = pd.DataFrame(dgipy.get_interactions(genes))

# Filter the Data for only Approved Therapeutics
filtered_df = df[df['drug_approved']==True].reset_index(drop=True)

# Filtering a problematic ANDA number TODO: Handle bad ANDA/NDA in middle of bulk requests
bad_andas = ['rxcui:32592','rxcui:20863','rxcui:20863']
filtered_df = filtered_df[~filtered_df['drug_concept_id'].isin(bad_andas)].reset_index(drop=True)

openfda_data = dgipy.get_drug_applications(list(filtered_df['drug_name']))

# HTTPError: 404 Client Error: Not Found for url: https://api.fda.gov/drug/drugsfda.json?search=openfda.application_number:%22ANDA090610%22
jsstevenson commented 3 weeks ago

Should be resolved in #74