CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.44k stars 109 forks source link

Boolean search operators / search syntax documentation #329

Open chrisspen opened 1 year ago

chrisspen commented 1 year ago

Is there any formal documentation on the search syntax supported?

Like, is the exclusive "AND" operator supported? If I search for "term1 term2", pagefind seems to treat all searches like all the terms are ORed, so a result will contain at least one result, and maybe others if I'm lucky.

How would I tell pagefind to only return results that contain all the keywords?

bglw commented 1 year ago

No formal syntax has been implemented yet — it's something I'm hoping to do before a 1.0 release but I can't guarantee I'll get to it. There's a small conversation about this in #70 but no work has been started.

For some context on the current state:

The current search strategy could be thought of as "best effort". Specifically in your case, term1 term2 will be treated as term1 AND term2 if both words exist in the corpus — so Pagefind will bias to showing only the most specific pages in the case that it recognizes both words.

If one of the two words isn't found anywhere in the search index, then that word will be ignored. So in this case if term2 doesn't exist anywhere on the site being indexed, then Pagefind will execute the search as simply term1. In this sense it's biased toward returning some results, rather than none.

There shouldn't be a case where you see term1 term2 returning ORed results — let me know if this is definitely happening. I can't see a way this would be getting through the current search function, though. The excerpts generated sometimes aren't the best, and won't contain both words, so sometimes the matches might look worse than reality. Another explanation is that Pagefind does search all word extensions, so term1 term2 will also return a page containing term1 and term22.

Hopefully that context helps! In summary

How would I tell pagefind to only return results that contain all the keywords?

As long as both keywords exist (and aren't common prefixes) then this is the current behaviour. But I am keen on supporting a more formal search documentation 🙂

eklausme commented 11 months ago

First of all, I would like to thank the authors of Pagefind for this really easy to use search-tool!

I stumbled upon this issue because I also thought that Pagefind does not have an AND condition -- this perception is obviously wrong, as illustrated by above answer from bglw.

What is "missing", though, is to specify word groups, i.e., a sequence of two or more words to search for and require that they be found together. For example, for the famous sentence in Shakespeare's Hamlet:

To be, or not to be, that is the question

it is difficult to find to and be. It is the combination of those two words, which make them stand out. So what might be needed is searching for something like to+be, or that+is+the+question.

Also see Pagefind: Searching in Static Sites. As stated there, it is not a pressing issue, and mostly not important for technical blogs.

bglw commented 11 months ago

👋 Hey @eklausme!

Yes, that kind of adjacency would be great! Ideally, I would like Pagefind to take that into account by default. Given a plain search for to be, pages where those words are close or adjacent should rank higher than pages where those words are paragraphs apart.

That data does already exist when searching — if you search for "to be" in quotes you'll see only pages with those words adjacent are in the results. To do the better generic ranking, it's just a matter of finding a good algorithm to calculate that ranking, given Pagefind's available data, without blowing out the search performance.

Not something I have had time for yet, but hopefully will one day! 🙂

leancept commented 10 months ago

I'm using Pagefind to show a list of related articles using the current article's tags. Problem is, it only shows articles that have exactly the same tags as the one being viewed. I've solved it by reducing the keyword set until Pagefind returns results. A fuzzy search matching, or one based on OR would be great though.

bglw commented 10 months ago

@leancept if you're showing a list based on a known set of tags, then filtering sounds like a good path that does support this :)

https://pagefind.app/docs/js-api-filtering/#using-compound-filters

You would be able to do something like:

await pagefind.search(null, {
    filters: {
        tag: {
            any: ["tag one", "tag_two", "tag_three"]
        }
    },
});