Currently, we break each page body down into keywords and then perform a keyword lookup at runtime.
This is space/memory efficient, but it's not very robust. For example, ACT will return the correct results, but ACT_ will return nothing.
I need some help with this!
I don't know how to generate a single structure of search terms that I can easily query later.
The product file can't be too big because we have to pull it into memory every time we search.
We should stick to using non-cloudflare solutions so we can maintain compatibility with self-hosting.
My general goal was to strip any characters out that would cause search conflicts or significantly increase the size of the search blob before generating the reverse lookup of terms -> page IDs (and search context).
The main challenge for searching on Gmodwiki is that we need to pre-compute the search terms. We don't have a database of page entries that we can query at search-time, and we can't reasonably shove the full wiki content into a json object.
Currently, we break each page body down into keywords and then perform a keyword lookup at runtime.
This is space/memory efficient, but it's not very robust. For example,
ACT
will return the correct results, butACT_
will return nothing.I need some help with this!
I don't know how to generate a single structure of search terms that I can easily query later. The product file can't be too big because we have to pull it into memory every time we search.
We should stick to using non-cloudflare solutions so we can maintain compatibility with self-hosting.
This is the entrypoint for the SearchManager, which generates the final JSON blob we use to perform searches. Each scraped page's inner-html content (that is, the stuff that changes as you navigate each page) is passed into this function: https://github.com/CFC-Servers/gmodwiki/blob/main/build/modules/search.ts#L100-L124
My general goal was to strip any characters out that would cause search conflicts or significantly increase the size of the search blob before generating the reverse lookup of terms -> page IDs (and search context).
The main challenge for searching on Gmodwiki is that we need to pre-compute the search terms. We don't have a database of page entries that we can query at search-time, and we can't reasonably shove the full wiki content into a json object.