gbif / gbif-api

GBIF API
Apache License 2.0
27 stars 5 forks source link

Registry api /dataset/search sometimes does not include recordCount #52

Open rukayaj opened 4 years ago

rukayaj commented 4 years ago

Hiya, I was wondering if there's a reason why some results in /dataset/search don't return a recordCount attribute?

An example is the 7th result here https://api.gbif.org/v1/dataset/search?q=plant&publishingCountry=AR - (key = 0ce9ca26-0e89-4f63-94fe-124d47a4451a). The API result doesn't have recordCount included, but you can see on https://www.gbif.org/dataset/0ce9ca26-0e89-4f63-94fe-124d47a4451a it has 10343 records.

MortenHofft commented 4 years ago

intro: recordCount is a confusing field. It means something different for checklists and occurrences.

But even knowing that the numbers look wrong I agree. It will sometimes be missing (as in your example) and other times it doesn't match number of records in the checklist.

{
"key": "6ac09c4d-bf7b-4e47-9c2d-f5abf6e89aa0",
...
"recordCount": 1109
}

But it has 1211 records and 1205 as source. There might be some filter applied, but I cannot figure out what it would be.

MattBlissett commented 4 years ago

The counts in the search index are maintained separately from other systems, to allow the registry to be independent of them. Sometimes, updates to the search index fail.

We're just transitioning to a new search index (probably on Monday), and the counts should then be current. I'm not sure if there's been work to improve the reliability other than at a rebuild of the search index though -- @fmendezh ?

rukayaj commented 4 years ago

Perhaps it would be better to exclude it from the API results if it's not reliable, or to add an explanation on the documentation page?

I guess it's related to this issue https://github.com/gbif/pipelines/issues/245 but I'm not clear what @muttcg means that it returns count number?

Edit: also I just noticed this https://github.com/gbif/registry/issues/9 - suggestion to refactor recordCount into taxonCount and occurrenceCount, which makes the difference between checklist/occurrences more explicit.