CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.45k stars 111 forks source link

Excluding certain terms from search #70

Open c-w opened 2 years ago

c-w commented 2 years ago

Hi there. Thanks for releasing this amazing project! πŸ™‡ I have a question about more advanced search queries:

Is it possible to exclude certain terms from a search? E.g. I'd like cats ~"cute dogs" to return all documents that contain the word "cats" but not the phrase "cute dogs".

If this functionality isn't implement yet but you'd be happy to have it in the scope of the project, I'm happy to work on it and open a pull request if you could provide me with some guidance on how to best go about this.

bglw commented 2 years ago

Hello!

Boolean search operations isn't something I've touched yet, but I'm certainly happy for it to be in scope for the project. I haven't tidied things up for contributions yet, but if you're happy to delve in deep I can point you in some hopefully-good directions 😁

I think cats -"cute dogs" might be a better syntax to match Google operations, but I'm happy to inherit the operations of any sufficiently large incumbent search platform. Also, they don't have to be implemented now, but it would be good to keep operations like cats OR dogs in mind when building this.

An ad-hoc contributing guide

Pagefind sprawls a few Rust crates at the moment: pagefind, pagefind_stem, pagefind_ui, and pagefind_web. The main changes you'll need to make are to pagefind_web, which is the search webassembly module. Inside the pagefind crate lives the JS search API, which will also need some tweaks.

I'll give a super quickℒ️ rundown of the things I think you'll need to touch to implement this:

I don't have time at the moment to cover everything, but if you have a skim of those files/functions and let me know any questions you have I can help out further. I can also cover the testing/local dev setup tomorrow (if you're game to give this a crack πŸ™‚)

Let me know β€” hopefully that isn't too daunting an info dump!

c-w commented 2 years ago

Thanks for the details! Unfortunately in the meantime this use-case got deprioritized for the integration I'm working on, so I won't have the time to work on this in the near future after all. Anyways, great context for someone else to pick it up πŸ‘

bglw commented 2 years ago

No worries!

If anyone reading this is keen to jump in, drop a comment and I'll finish elaborating the testing setup if it still remains undocumented β€” otherwise I'll get around to this feature one day πŸ™‚

bglw commented 1 year ago

A rework of the search query syntax hasn't landed yet, but it's still on the cards πŸ™‚

Sailsman63 commented 10 months ago

A point for thought: how should booleans interact with the stemming/lematization system?

As a website user: if I'm to the point where I'm using advanced search syntax, I absolutely need stem NOT stemming and EXACT_MATCH "teh qucik brown fox" to work as written.