Closed tomron closed 6 years ago
Hi, The library does apply batching to the requests (250 articles at a time). Therefore there is not batching / pagination method.
If you really need it you can do something like this:
from pymed import PubMed
pubmed = PubMed()
# Use the low level API to retrieve the article IDs that are related to the query
article_ids = pubmed._getArticleIds(query=query, max_results=9999999)
# This is an opportunity to show the number of results
print("The total number of results matching the query is", len(article_ids))
# Use the low level API to retieve the articles
# NOTE: pubmed._getArticles() already expects a list of article IDs (which will be processed in a single
# call to PubMed). In this sample I'm providing here I'll insert the article IDs one by one but please
# don't do this in your own code!
articles = [list(pubmed._getArticles(article_ids=[article_id]))[0] for article_id in article_ids]
# The preferred way it to make batches and give those batches to pubmed._getArticles() (which is
# what the library does...) like this:
from pymed.helpers import batches
batched_articles = [pubmed._getArticles(article_ids=batch) for batch in batches(article_ids, 250)]
for batch in batched_articles:
for article in batch:
# Do something here
print(article.title)
The articles variable in the last example is a generator, so the next request is not made until you're done with this one.
I'll try to add some easier helper methods in the next release.
I hope that helps?
Thanks, I think it is a fair enough solution for now but would like to have advanced option such as count without retrieving all the ids, queries based on specific field, etc.
I'll take care of the count method ;)
As for the querying... It's possible to enter any PubMed query (also for specific fields). Try for example something like:
((tomron[Author]) AND ("2018/01/01"[Date - Create] : "3000"[Date - Create])) AND PubMed[Title]
(which will get you all articles published after the first of January 2018 (until now), by you with "PubMed" in the title)
Tip: Use the "advanced" query builder on the PubMed website and copy the query to your code for deeper analysis of the articles.
Super, thanks
Update: I've added a new method for counting the total number of matching articles (without retrieving any). It's now available in pymed version 0.8.1.
pip install pymed==0.8.1
from pymed import PubMed
pm = PubMed()
number_of_articles = pm.getTotalResultsCount(query="Occupational Health[Title]")
print("Number of articles with Occupational Health in the title is", number_of_articles)
Hi, Is there a way to get count of the relevant results with respect to the query and \ or to paginate the results. E.g. read the first 500, then 500-1000, etc.?