gbif / pygbif

GBIF Python client
https://pygbif.readthedocs.io/en/latest/
MIT License
112 stars 30 forks source link

Can't get more than 300 records when using occ.search() #80

Open simon-tarr opened 3 years ago

simon-tarr commented 3 years ago

I'm sending what I think is a straightforward call to GBIF using:

occurrences.search(geometry=wkt, limit=10000)

However, no matter what I set the limit to, I never get more than 300 results back. According to the package's paper, pygbif uses internal paging to return more than 300 results (if specified using the limit argument) up to GBIF's limit of 200,000 records.

I have tried passing very large polygons over big stretches of the UK and Europe and never get more than 300 results.

Am I doing something wrong?

I'm using Python v3.8.6 and the latest version of pygbif.

sckott commented 3 years ago

Thanks for the issue @simon-tarr !

That mention of paging in the paper is for the R client, not the Python client.

Right now you have to do the pagination yourself, using limit and offset params

automated pagination has been discussed in #63 - it will take some work, not sure when it will be done. faster of course if someone sends a PR

simon-tarr commented 3 years ago

Thanks for the reply @sckott. Totally missed #63, sorry about that. A friend and I will look into this - if we can develop something sensible will raise a PR for you!

simon-tarr commented 3 years ago

Hi @sckott - what's the best way of finding out the total number of records using occ.search() (if it's even possible within that method or pygbif as a whole)? With that information, sounds like it would be reasonably straightforward to write a loop.

e.g. if there are 650 records, records 1 - 300 are in the first iteration, set offset to 301, grab results 301 - 600, set iteration to 601, grab results 601 - 650. Sound reasonable?

sckott commented 3 years ago
from pygbif import occurrences as occ
x = occ.search(taxonKey = 3329049)
x['count']

Or with the count API route

occ.count(taxonKey = 3329049)

Note that the search and count methods are for different API routes, with potentially different behavior https://www.gbif.org/developer/occurrence - GBIF said they will eventually remove the count API route