dronefly-garden / dronefly

Red Discord Bot V3 cogs for naturalists.
Other
16 stars 3 forks source link

taxon: search more than the first 30 results for phrase matches #145

Closed synrg closed 3 years ago

synrg commented 3 years ago

Problem:

My test case for this is ,taxon "guan". Because iNaturalist itself does not provide exact (phrase) matching itself, we use post-filtering the result set to only match results that have exactly the phrase typed in double-quotes. If a taxon that has the exact word "guan" in it is found within the first 30 results, it is shown. But what actually happens is "No exact match" is reported, and the user is left scratching their head, because they are sure that there is at least one taxon with "Guan" in the name! It just so happens that there are several results that do have exactly "guan" in the name, but they are not evident until you get the next 30 matches.

Analysis:

I previously thought that searching more than the first 30 results (the upper limit of per_page parameter with /v1/taxa/autocomplete endpoint) was fairly pointless because then you're getting into really obscure things that are less likely to be relevant. However, any time post-filtering of the results is done, this can result in "No exact match" even though the "obvious" best match hasn't been found yet.

At the very least, the "No exact match" message leaves a lot unexplained. What it actually should say here is that an exact match couldn't be found with reasonable effort. That said, I don't think only searching through one page of results is a reasonable effort! It's a very poor effort indeed, considering that some other commands can do up to 11 api calls in a row (like the ,me command). We could do far better here, even with the relatively cheap expenditure of 4 api calls in a row, especially now that aiothrottler takes the sting out of it (i.e. that will not cost us 4 seconds to do, most of the time; assuming enough capacity is available in the throttler, it will barely take longer than one call!)

Proposed fix:

Therefore, I propose we do two things:

  1. Fix the "No exact match" message to state what's really going on, and what the user can do about it if they still don't find a match (probably should suggest they try ,search taxa <whatever-their-search-terms-were-except-without-double-quotes> and just page through the results manually until they find what they were looking for).
  2. Don't give up after one API call returning 30 results. If there was no match in the first 30, and the reason was the post-processor ruled out all the potential matches, then try again, up to four times total (i.e. up to a maximum of 120 results), before finally emitting the new, improved message.

Implementer notes:

synrg commented 3 years ago

Fixed in 80345e65a8465d595fd394a774355b51876a940f