VertNet / api

Codebase for the new module-based VertNet app. API module
1 stars 0 forks source link

Optimize count returns #3

Open jotegui opened 8 years ago

jotegui commented 8 years ago

Currently, the way counts are calculated imply retrieving the full list of records and then returning just the length of the array. This is highly inefficient (e.g. it took more than 2h to get the volume of records mentioning mvz)

tucotuco commented 8 years ago

I do not know of a way to get counts efficiently AND accurately with GAE. However, for the case in question of small record sets, I believe the estimated count is a good enough estimate and could be used to make a determination.

jotegui commented 8 years ago

You are right, @tucotuco , I was not familiar with Google's search api and I guess I was expecting a bit too much, like a count method or so... So, it seems the only way of counting records is to actually retrieve them and return the length of the array. sigh

Actually, given this difficulty and the current structure, I have been thinking on omitting this whole issue, and here is why:

  1. There is little (if any) potential use for a method such as count from the users' perspective.
  2. Record counts are actually only useful for direct calls to the download api, since portal downloads come after a search event, where record count is already calculated. And direct downloads via the portal-web have already been implemented.
  3. If we enable a new parameter in the search API (like format), where users can decide whether to get records in JSON or TXT format, they will be able to download via that method. But that makes the distinction between both methods a bit blurry...
  4. We can use an approach such as GBIF's: put a hard limit on the number of records retrievable via direct call to the search API, and suggest to use the download API for larger searches...

Again, just thinking out loud here...

tucotuco commented 8 years ago

I agree with all of these observations.

On Tue, May 24, 2016 at 8:01 AM, Javier Otegui notifications@github.com wrote:

You are right, @tucotuco https://github.com/tucotuco , I was not familiar with Google's search api and I guess I was expecting a bit too much, like a count method or so... So, it seems the only way of counting records is to actually retrieve them and return the length of the array. sigh

Actually, given this difficulty and the current structure, I have been thinking on omitting this whole issue, and here is why:

  1. There is little (if any) potential use for a method such as count from the users' perspective.
  2. Record counts are actually only useful for direct calls to the download api, since portal downloads come after a search event, where record count is already calculated. And direct downloads via the portal-web have already been implemented.
  3. If we enable a new parameter in the search API (like format), where users can decide whether to get records in JSON or TXT format, they will be able to download via that method. But that makes the distinction between both methods a bit blurry...
  4. We can use an approach such as GBIF's: put a hard limit on the number of records retrievable via direct call to the search API, and suggest to use the download API for larger searches...

Again, just thinking out loud here...

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/VertNet/api/issues/3#issuecomment-221235796