foursquare / fsqio

A monorepo that holds all of Foursquare's opensource projects
Apache License 2.0
254 stars 54 forks source link

Request for more docs on Twofishes autocomplete #20

Open steveha-ziprecruiter opened 8 years ago

steveha-ziprecruiter commented 8 years ago

I have been trying to use the Twofishes autocomplete feature. I am building JSON requests and trying to specify everything I can to tell Twofishes how to respond. In particular, I am providing latitude/longitude and specifying autocompleteBias of LOCAL.

Here's a sample JSON query (pretty-printed for readability):

{
  "autocomplete": true,
  "autocompleteBias": 2,
  "cc": "US",
  "ll": {
    "lat": 34.01945,
    "lng": -118.49119
  },
  "maxInterpretations": 25,
  "query": "Ca",
  "responseIncludes": [7],
  "woeHint": [7],
  "woeRestrict": [22,7,10,9,8,11,12,13]
}

The above specifies the lat/lng of Santa Monica, and is attempting to autocomplete the city of Calabasas (less than 20 miles away straight-line distance). The results do not include Calabasas but do include results from distant countries.

Even worse, if you send JSON just like the above but with the "query" set to "Sa", Santa Monica isn't in the results, even though the distance from the lat/lng is 0.

Am I doing anything wrong in my query? Is there a better way to do the query? Can you share any documentation on how to best run autocomplete queries?

rahulpratapm commented 8 years ago

Hi Steve,

When I test these queries on our internal twofishes instance, I get back Santa Monica as the top result for [sa] but do not get back Calabasas for [ca]. It does show up as the top result for [cal], though. I'm not sure why you don't see Santa Monica--it depends on what's in your index, but the following explanation might help you understand.

Unfortunately, local bias doesn't work well for short queries. The precomputed autocomplete prefix index stores a list of up to 50 features for each prefix (up to length 5), ordered by global importance/popularity. For such short prefixes, the competition to make it into the top 50 globally is intense, to say the least, and Calabasas does not make the cut for the [ca] prefix. So, although the bias is properly applied during ranking, we don't even retrieve this result for this query in order to bump it up to the top.

In general, if you try a longer prefix it should eventually show up. At 6 characters and longer, we no longer use the prefix index but do a live prefix lookup on the full name index, so it is pretty much guaranteed to show up then.

Rahul.