CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.24k stars 99 forks source link

Support Text Fragments #506

Open itsmatteomanf opened 8 months ago

itsmatteomanf commented 8 months ago

I have seen the new v1.0.4 release supporting text highlighting.

There is a Standard API supporting this (except in Firefox, for some reason), which is text fragments.

This doesn't require any JS, which would be great. But that is not a query param, but it's after an #, so the value would become #:~:text=<highlight> instead of the current ?highlight=<highlight>.

bglw commented 7 months ago

This was considered, but didn't win out over the JavaScript implementation for now. Both for browser compatibility reasons, and because the JavaScript implementation provides some more benefits both now and down the line. A good example is that the highlighting is aware of any attributes you have used like data-pagefind-body and data-pagefind-ignore 🙂

The other issue is that the spec only supports highlighting the first result, rather than all 😔

In saying that, the implementation did land in a good place to have this as an option as well! So the next time I'm in and around that file I can bundle this in with some changes. If anyone particularly wants this feature and is keen to jump ahead...

Notes for me later (or someone else sooner)

Outside of passing the requisite option around, the logic for URL handling is entirely contained in this function: https://github.com/CloudCannon/pagefind/blob/971186cdd062995c14d2cc2bc711b09c850d086f/pagefind_web_js/lib/coupled_search.ts#L287-L311

Searching the codebase for highlightParam will illuminate all of the places a new config option (say, highlightFragment: bool) would need to be piped, and all of the relevant tests.

Additionally, the sub-result handling will need to be made aware, as currently that will bowl over any hash that was output by the processedUrl function above: https://github.com/CloudCannon/pagefind/blob/971186cdd062995c14d2cc2bc711b09c850d086f/pagefind_web_js/lib/sub_results.ts#L63-L77

NEWESTERS commented 7 months ago

I've implemented text fragment support on my website by parsing pagefind's result with regexp:

const [_, textFragment] = data.excerpt.match(/<mark>(.+?)<\/mark>/);

And it works fine in cross-page transitions, but usage notes in MDN article says:

Text fragments are invoked only on full (non-same-page), user-initiated navigations.

Suchwise user can't scroll to text fragment from search result within already active page.

Therefore i've got another feature request to fallback this behavior: include nearest header in metadata.

Example

Imagine this page:

<article>
  <h1 id="main">Main page title</h1>
  <p>Interesting fact about JS</p>
  <h2 id="subtitle">Subtitle</h2>
  <p>Interesting fact about CSS</p>
</article>

Result of pagefind.search("CSS") could return:

{
  /* ... other result keys ... */
  "url": "/url-of-the-page/",
  "excerpt": "Interesting fact about <mark>CSS</mark>.",
  "meta": {
    "title": "Main page title",
    "nearest-title-id": "subtitle" /* <- feature request */
  }
}

With this information i can generate link with normal anchor: /url-of-the-page/#subtitle to navigate user within the page.

UPDATE

I've realized that this behavior can be achieved with sub_results property

zopieux commented 6 months ago

Note to whoever implements this: Firefox still doesn't support text fragments, but the spec allows mixing both normal element anchors and text fragments, so please expose both, as long as a heading ID is available.