CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.4k stars 106 forks source link

Issue with indexing and text that contains underscores #676

Open demetris opened 1 month ago

demetris commented 1 month ago

Hello, @bglw and all the good people at CloudCannon!

I am building a small site for documenting WooCommerce action hooks and filter hooks. The pages are named after the hook they document, and they have titles like this:

So, the site has a page titled woocommerce_init as well as a page titled woocommerce_loaded. But when I search for init or loaded, Pagefind finds nothing:

20240804-1-pagefind-woocommerce_loaded-annotated

When I search using the full title of the page, e.g., woocommerce_init or woocommerce_loaded, Pagefind finds the pages. It also finds the pages when I search using the full title without the underscores, e.g., woocommerce init or woocommerce loaded:

20240804-2-pagefind-woocommerce_loaded-annotated

20240804-3-pagefind-woocommerce_loaded-annotated

If I rename the page titles to use hyphens instead of underscores (e.g., rename woocommerce_loaded to woocommerce-loaded) and reindex the site, Pagefind gives me the results I expect:

20240804-4-pagefind-woocommerce-loaded-with-hyphen-annotated

Do you know why this happens or if it’s something I can fix on my end?

In case it matters, the site is built with Astro. It is live and the pages in my examples can be accessed here:

Cheers!

bglw commented 4 weeks ago

Hi @demetris 👋

Interesting! Going to your link I can see the same behavior. I'm unsure why, I'll need to look into that.

Searching for loaded should indeed match woocommerce_loaded — and we have integration tests that ensure that — so there must be some confounding factor with this content in particular. I'll take a look soon 👀

demetris commented 3 weeks ago

Thank you, @bglw.

Looking forward to seeing what you find.