RBGKew / pykew

Python library for accessing Kew's data services
30 stars 4 forks source link

Retrieving all responses for a search #5

Open alrichardbollans opened 2 years ago

alrichardbollans commented 2 years ago

I'm trying to get all hits for a particular search but seem to be limited only to the first page, and can't work out how to modify the query to return different pages. Is this possible? I realise I can change the number of results per page with the perPage parameter but this causes a timeout for searches that return a large number of samples.

alrichardbollans commented 2 years ago

It's possible to specify the page with parameter p and this works e.g. resp = requests.get('https://powo.science.kew.org/api/2/search?characteristic=hairy&p=1') returns the second page as it should.

However, when I try to use the powo.search method I get an empty response e.g.:

class POWOQueryParams(Enum):
    page = "p"
query = {Characteristic.characteristic: 'hairy',
             POWOQueryParams.page: '2'}
result = powo.search(query)

returns 0 totalResults

alrichardbollans commented 2 years ago

I've created a pull request to add this functionality.

malcolm-s commented 2 years ago

Hi there!

This package is not actively maintained, but I believe it already contains the functionality you're after. The object returned from .search(query) is iterable so you can use it in a standard for-loop - see herre).

import pykew.powo as powo
from pykew.powo_terms import Name, Geography

query = { Name.genus: 'Poa', Geography.distribution: 'Africa' }
results = powo.search(query)
for result in results:
    print(result)

That should get you what you need. I appreciate that you opened the PR to add this functionality, but on top of that we do not enable page-based pagination on our API for performance/stability reasons. We only use cursor based pagination because our underlying search index goes 💥 boom💥 when paginating very large result sets.

Hope that helps!