gijswobben / pymed

PyMed is a Python library that provides access to PubMed.
MIT License
191 stars 111 forks source link

Paginate \ count result #1

Closed tomron closed 6 years ago

tomron commented 6 years ago

Hi, Is there a way to get count of the relevant results with respect to the query and \ or to paginate the results. E.g. read the first 500, then 500-1000, etc.?

gijswobben commented 6 years ago

Hi, The library does apply batching to the requests (250 articles at a time). Therefore there is not batching / pagination method.

If you really need it you can do something like this:

from pymed import PubMed
pubmed = PubMed()

# Use the low level API to retrieve the article IDs that are related to the query
article_ids = pubmed._getArticleIds(query=query, max_results=9999999)

# This is an opportunity to show the number of results
print("The total number of results matching the query is", len(article_ids))

# Use the low level API to retieve the articles
# NOTE: pubmed._getArticles() already expects a list of article IDs (which will be processed in a single
# call to PubMed). In this sample I'm providing here I'll insert the article IDs one by one but please
# don't do this in your own code!
articles = [list(pubmed._getArticles(article_ids=[article_id]))[0] for article_id in article_ids]

# The preferred way it to make batches and give those batches to pubmed._getArticles() (which is
# what the library does...) like this:
from pymed.helpers import batches
batched_articles = [pubmed._getArticles(article_ids=batch) for batch in batches(article_ids, 250)]
for batch in batched_articles:
    for article in batch:
        # Do something here
        print(article.title)

The articles variable in the last example is a generator, so the next request is not made until you're done with this one.

I'll try to add some easier helper methods in the next release.

I hope that helps?

tomron commented 6 years ago

Thanks, I think it is a fair enough solution for now but would like to have advanced option such as count without retrieving all the ids, queries based on specific field, etc.

gijswobben commented 6 years ago

I'll take care of the count method ;)

As for the querying... It's possible to enter any PubMed query (also for specific fields). Try for example something like:

((tomron[Author]) AND ("2018/01/01"[Date - Create] : "3000"[Date - Create])) AND PubMed[Title]

(which will get you all articles published after the first of January 2018 (until now), by you with "PubMed" in the title)

Tip: Use the "advanced" query builder on the PubMed website and copy the query to your code for deeper analysis of the articles.

tomron commented 6 years ago

Super, thanks

gijswobben commented 6 years ago

Update: I've added a new method for counting the total number of matching articles (without retrieving any). It's now available in pymed version 0.8.1.

pip install pymed==0.8.1

from pymed import PubMed
pm = PubMed()
number_of_articles = pm.getTotalResultsCount(query="Occupational Health[Title]")
print("Number of articles with Occupational Health in the title is", number_of_articles)