Closed janfari closed 1 year ago
Hello! (Janey, right?)
So, I wouldn't say that search is broken so much as it's poorly calibrated! I appreciate your report & your examples, and I'll try to tinker with the settings to make search more intuitive, but I can at least explain why it works this way.
The trip search function uses Postgres Full Text Search (FTS). That is, the database takes multiple bodies of text, assigns relative importance to each type of text, and uses that to build a search index: https://github.com/DavidCain/mitoc-trips/blob/f5e88438bdf0b074d7fa2345e5c083b7132bda7a/ws/models.py#L964-L969
Then, whenever somebody types in a word, Postgres looks in the search space for the word to appear, and assigns a numerical score to how well a trip matches.
When you search for "sled," Postgres is clever enough to realize that you're probably also interested in "sledding" since sled
is the root of sledding
. But if you search sledding
, it infers you probably want just the word sledding
, and not its root(s).
If you search for "sledding," several of the matching trips include the word "sledding" once, but many mentions of "sled." Because only one word matches, the resulting score is pretty slow, and the threshold for a matching trip isn't met (thus, it's not shown). Conversely, if you search sled
the mentions of sledding
count and the mentions of sled
(so it's a higher score and a better match).
Because sleddi
isn't a word, you won't get matches (it's also not the root of sledding
).
Pierc
matches Piercing
and Pierce
(which is why "Piercing on Pierce" comes up).
Implementing search very well (distinguishing between typos, whether humans want partial word match or exact match, valuing how much the presence of a certain word should matter, etc.) is a fundamentally hard problem. You certainly don't want the earch "pierce" to rank trips that talk about ear piercing a couple times above a trip that is going to Mt Pierce, but says the mountain name only once.
What I'll probably do here is just lower the score that's required for a trip to appear on the list. Then, at least, you'll see "sledding" searches produce some results.
But I don't think searching for non-existent English words will ever work given the strategy (FTS) I've chosen.
Oh that's interesting, I didn't realize there was a dictionary associated with the search! Makes sense then I suppose for certain words to fail weirdly. I don't know anything about postgres but is it simple (or even at all worthwhile) to customize the dictionary being used? Many of the mountain names we might visit as a club probably aren't in that set.
No good results (recent trips have been led to all these): Kearsarge Jennings peak Monroe Moosilauke Watatic Purgatory
Searches that return good results (somewhat unexpectedly): Chocorua Flume Kinsman Sherb Waumbek Franconia Wachusett Passaconaway
Oh I'm sorry to have misled you - I don't mean at all to imply that there's a dictionary of words and that search won't return results if the word isn't in the dictionary. That wouldn't work very well for a large number of the named places in New Hampshire! =)
I was just trying to explain why you'll get results for sled
but not sleddi
-- there's a built-in dictionary of English words that can be normalized so that they match other forms of that word -- "lexemes."
The search behavior you're seeing can all be explained by:
And the two levers we can pull are:
In any case, thank you for the report -- this is more than enough information for me to tweak search!
@janfari - I just removed the minimum search score, and the terms which you mention should now be returning results. Let me know if search still seems off to you!
There's a notable downside in that some search queries will now take longer and there may be some false positives, but hopefully this helps for now.
Certain words seem to break the search and I can't figure out exactly what the pattern is. Here are some examples:
"pierce" returns no results "pierc" returns 20+ results with "pierce" in the title "skiing" returns nothing, "ski" returns 50+ results with "skiing" in the title "sledding" returns nothing, "sleddin" "sleddi" and "sledd" also return nothing. "sled" returns many results, 9 of which have the word "sledding" in the title