Open c-w opened 2 years ago
Hello!
Boolean search operations isn't something I've touched yet, but I'm certainly happy for it to be in scope for the project. I haven't tidied things up for contributions yet, but if you're happy to delve in deep I can point you in some hopefully-good directions π
I think cats -"cute dogs"
might be a better syntax to match Google operations, but I'm happy to inherit the operations of any sufficiently large incumbent search platform. Also, they don't have to be implemented now, but it would be good to keep operations like cats OR dogs
in mind when building this.
Pagefind sprawls a few Rust crates at the moment: pagefind
, pagefind_stem
, pagefind_ui
, and pagefind_web
. The main changes you'll need to make are to pagefind_web
, which is the search webassembly module. Inside the pagefind
crate lives the JS search API, which will also need some tweaks.
I'll give a super quickβ’οΈ rundown of the things I think you'll need to touch to implement this:
request_indexes
function: https://github.com/CloudCannon/pagefind/blob/5bde67679b0b5720c0274513210b9d21ed2f3c40/pagefind_web/src/lib.rs#L118-L149
cats -"cute dogs"
, it will ultimately need to return a list of the chunks containing cats
, cute
, and dogs
. The input to this function will depend whether JS or Rust is handling the boolean parse step.exact_search
function above that function β we'll need to combine these operations for the boolean world where part of a search might be exact, but that shouldn't be a big challenge. I don't have time at the moment to cover everything, but if you have a skim of those files/functions and let me know any questions you have I can help out further. I can also cover the testing/local dev setup tomorrow (if you're game to give this a crack π)
Let me know β hopefully that isn't too daunting an info dump!
Thanks for the details! Unfortunately in the meantime this use-case got deprioritized for the integration I'm working on, so I won't have the time to work on this in the near future after all. Anyways, great context for someone else to pick it up π
No worries!
If anyone reading this is keen to jump in, drop a comment and I'll finish elaborating the testing setup if it still remains undocumented β otherwise I'll get around to this feature one day π
A rework of the search query syntax hasn't landed yet, but it's still on the cards π
A point for thought: how should booleans interact with the stemming/lematization system?
As a website user: if I'm to the point where I'm using advanced search syntax, I absolutely need stem NOT stemming
and EXACT_MATCH "teh qucik brown fox"
to work as written.
Hi there. Thanks for releasing this amazing project! π I have a question about more advanced search queries:
Is it possible to exclude certain terms from a search? E.g. I'd like
cats ~"cute dogs"
to return all documents that contain the word "cats" but not the phrase "cute dogs".If this functionality isn't implement yet but you'd be happy to have it in the scope of the project, I'm happy to work on it and open a pull request if you could provide me with some guidance on how to best go about this.