CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.22k stars 97 forks source link

Do not download content/rawContent when loading search result data #601

Open ulfschneider opened 2 months ago

ulfschneider commented 2 months ago

When loading the data for a search result the content/rawContent is always part of the dataset. In my case the excerpt data would be just enough. To save bandwidth and increase speed I would like to avoid downloading content/rawContent. Do you think it´s possible to configure Pagefind so that content/rawContent is not send to the user when loading the search results data?

Best regards, Ulf

nhoizey commented 2 months ago

It would be nice indeed! 👍

bglw commented 2 months ago

👋 @ulfschneider

In the current architecture there isn't a simple toggle — the main barrier is that the excerpt is calculated in the browser, by snipping out a section of the content that contains the search term. So it isn't possible to get the excerpt you're currently using without downloading the content field.

There could possibly be an option to manually tag an excerpt for a page using metadata, which would make that the only excerpt used no matter what was searched, which would let you opt out of loading the full content. It can make the results a bit harder to understand though, when you can't see why your search term matches on a given page.

Keen to hear thoughts on that.

(And just in case there's an ambiguity, content is only loaded once, and rawContent is a duplicate of it. Plus this is only loaded when displaying the final result, so this data doesn't live in the search index itself)

ulfschneider commented 2 months ago

Hi, ah yes, because you have to prepare the excerpt on the client I think I understand the difficulty. The dynamic excerpts require the full content and only a static excerpt can be prepared without downloading the full content.

Sending certain static content immediately with the search result (like the document title or a static excerpt) makes sense to me. The static content to carry immediately could be configurable when building the index.

At least having the document title immediately would be of great value in my view. A UI that indicates search suggestions with only the document title could be very fast in that case.

Thank you for pointing out how content is loaded and rawContent being a duplicate of it. I was not aware.

ulfschneider commented 2 months ago

To describe what I do with Pagefind I´ve put together a document: Searching a Jamstack site with Pagefind. When you do a search on that website, you´ll see that I provide a first suggestion with up to seven document titles (titles only), from where you can jump into the contents of a document, or alternatively submit a search. This is a usecase where I think getting the document titles immediately with the search result would be an improvement.