CFC-Servers / gmodwiki

A slimmed & improved mirror of the Garry's Mod wiki, with self-hosting
https://gmodwiki.com
GNU General Public License v3.0
18 stars 4 forks source link

Search Improvements #5

Open brandonsturgeon opened 9 months ago

brandonsturgeon commented 9 months ago

Currently, we break each page body down into keywords and then perform a keyword lookup at runtime.

This is space/memory efficient, but it's not very robust. For example, ACT will return the correct results, but ACT_ will return nothing.

I need some help with this!

I don't know how to generate a single structure of search terms that I can easily query later. The product file can't be too big because we have to pull it into memory every time we search.

We should stick to using non-cloudflare solutions so we can maintain compatibility with self-hosting.

This is the entrypoint for the SearchManager, which generates the final JSON blob we use to perform searches. Each scraped page's inner-html content (that is, the stuff that changes as you navigate each page) is passed into this function: https://github.com/CFC-Servers/gmodwiki/blob/main/build/modules/search.ts#L100-L124

My general goal was to strip any characters out that would cause search conflicts or significantly increase the size of the search blob before generating the reverse lookup of terms -> page IDs (and search context).

The main challenge for searching on Gmodwiki is that we need to pre-compute the search terms. We don't have a database of page entries that we can query at search-time, and we can't reasonably shove the full wiki content into a json object.

A1steaksa commented 5 months ago

To my mind, a pre-existing solution like https://sphinxsearch.com/ would be the most sane way of handling it