CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.52k stars 115 forks source link

Ignored content inside `<header>` element #75

Closed oscarotero closed 2 years ago

oscarotero commented 2 years ago

Hi. First of all, thanks for this awesome tool. I love it!

I've noticed the index ignore the content inside the header tags. For example, I have the following html structure:

<article data-pagefind-body>
  <header>
    <h1>This is the title</h1>
  </header>
  <div>
    This is the content
  </div>
</article>

When I search for the article title, I have no results. But if I replace the header with a div:

<article data-pagefind-body>
  <div class="header">
    <h1>This is the title</h1>
  </div>
  <div>
    This is the content
  </div>
</article>

Then the search return results. The documentation says <nav>, <script>, and <form> elements are skipped automatically, not sure if <header> is also included (it shouldn't).

bglw commented 2 years ago

Hi @oscarotero 👋

I haven't enumerated every ignored element in the docs, the full list is here:

https://github.com/CloudCannon/pagefind/blob/76748ed333d5f4a74305c97c5aa55c90e985f753/pagefind/src/fossick/parser.rs#L26-L29

My goal is to make this configurable, so the selected defaults should matter less. In saying that, header shouldn't be on this list — I think I added it in the same vein as footer, but header indeed is more relevant to a specific page.

Will remove 👍

bglw commented 2 years ago

Hello! This is fixed in the Pagefind v0.9.0 release 🎉