Closed jsstevenson closed 3 weeks ago
same thing, should we have some kind of skip_bad_lines=True type option for bulk request needs. I could see a clinical workflow with lots of data getting slowed to a stop just because of one bad request. That would probably cause a lot of headaches:
Commenting in to say this is currently causing headaches in preparing for the analysis pipeline.
Error:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
Cell In[15], [line 1](vscode-notebook-cell:?execution_count=15&line=1)
----> [1](vscode-notebook-cell:?execution_count=15&line=1) openfda_data = dgipy.get_drug_applications(list(filtered_df['drug_name']))
File ~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:380, in get_drug_applications(terms, api_url)
[373](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:373) for app in result["drugApplications"]:
[374](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:374) application_number = app["appNo"].split(".")[1].replace(":", "").upper()
[375](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:375) for (
[376](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:376) brand_name,
[377](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:377) marketing_status,
[378](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:378) dosage_form,
[379](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:379) dosage_strength,
--> [380](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:380) ) in _get_openfda_data(application_number):
[381](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:381) output["drug_name"].append(name)
[382](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:382) output["drug_concept_id"].append(concept_id)
File ~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:335, in _get_openfda_data(app_no)
[333](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:333) except requests.exceptions.RequestException as e:
[334](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:334) _logger.error("Request to %s failed: %s", url, e)
--> [335](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:335) raise e
[336](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:336) data = response.json()
[337](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:337) return [
[338](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:338) (
[339](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/Documents/repo/git/dgipy/src/dgipy/dgidb.py:339) product["brand_name"],
...
[1021](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/models.py:1021) )
[1023](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/models.py:1023) if http_error_msg:
-> [1024](https://file+.vscode-resource.vscode-cdn.net/Users/mjc014/Documents/repo/git/dgipy/notebooks/~/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/models.py:1024) raise HTTPError(http_error_msg, response=self)
HTTPError: 404 Client Error: Not Found for url: https://api.fda.gov/drug/drugsfda.json?search=openfda.application_number:%22ANDA078803%22
Code example for reproducing error:
# Search for interaction data for genes of interest
genes = ['BRAF','PDGFRA','PDGFRB']
df = pd.DataFrame(dgipy.get_interactions(genes))
# Filter the Data for only Approved Therapeutics
filtered_df = df[df['drug_approved']==True].reset_index(drop=True)
# Filtering a problematic ANDA number TODO: Handle bad ANDA/NDA in middle of bulk requests
bad_andas = ['rxcui:32592','rxcui:20863','rxcui:20863']
filtered_df = filtered_df[~filtered_df['drug_concept_id'].isin(bad_andas)].reset_index(drop=True)
openfda_data = dgipy.get_drug_applications(list(filtered_df['drug_name']))
# HTTPError: 404 Client Error: Not Found for url: https://api.fda.gov/drug/drugsfda.json?search=openfda.application_number:%22ANDA090610%22
Should be resolved in #74
_Originally posted by @mcannon068nw in https://github.com/GenomicMedLab/dgipy/pull/61#discussion_r1736186111_