Closed jankrepl closed 3 years ago
From https://dataguide.nlm.nih.gov/eutilities/utilities.html#esearch:
ESearch (esearch.fcgi) searches a database and returns a list of unique identifiers (UIDs) for records in that database which meet the search criteria. You can specify the search query, sort results, filter results by date, or combine multiple searches with Boolean AND/OR/NOT by adjusting the parameters. Remember, ESearch only returns UIDs, not full records. To retrieve the full records for each of the UIDs in your result set, consider using the EFetch utility.
import json
import requests
term="neuroscience"
url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term={term}&retmax=100000&retmode=json"
response = requests.get(url)
rep = json.loads(response.content.decode())
n_results = int(rep["esearchresult"]["count"]) # retrieve the total number of results
first_ids = rep["esearchresult"]["idlist"] # retrieve some/all IDs
Notes:
retstart
and retmax
to retrieve all the results (if the number of results is higher than 10.000).brain
term gave 2'104'774 resultsneuron
gave 745'445 resultsneuroscience
gave 483'713 resultsFrom https://dataguide.nlm.nih.gov/eutilities/utilities.html#efetch:
EFetch (efetch.fcgi) returns full data records for a list of unique identifiers (UIDs) in a format specified in the parameters. The list of UIDs is either provided in the parameters, or is retrieved from the History server.
import requests
from defusedxml import ElementTree
id = first_ids[0]
url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id={id}&retmode=xml"
response = requests.get(url)
rep = response.content.decode()
article_set = ElementTree.fromstring(rep)
Notes:
db
to pmc
and specify PMC id
.
🚀 Feature
Motivation
More articles (just abstracts though)
Pitch
More articles is better
Alternatives
?
Additional context
By PubMed we mean the database of article abstracts + metadata. See https://pubmed.ncbi.nlm.nih.gov/