Closed c-w closed 3 months ago
Ah, good find!
The addCustomRecord()
flow is stepping around the function that sets this — using addHTMLFile()
would override the language as you're expecting.
Will fix so that it overrides both cases 👍
There's another strange behavior I noticed related to stemming. If we add a few more test to the example above:
await index.addCustomRecord({
url: domain + "/c.html",
content: "industrialist and General Motors co-founder William C. Durant",
language: "unknown"
});
await index.addCustomRecord({
url: domain + "/p.html",
content: "George P. Knapp",
language: "unknown"
});
Now searching for "poop" or "crap" will match the single letter tokens P and C which is quite unexpected to me.
PR created for the language fix + test case: https://github.com/CloudCannon/pagefind/pull/552
Re: the strange behavior, that's currently intentional, though indeed here isn't the most useful. Pagefind really likes giving some result over nothing. One way it does that is to trim the search term back until it finds a search term that would match — the idea being that if you type generalx
it gets trimmed back to general
. There's no escape hatch on this though, so it will trim it back to one character if need be.
It's an open area for improvement — hopefully one day getting some better typo tolerance features in place will allow us to ease back on this one to something a little more intuitive 🙂
Thanks for the explanation. For now I'll hack around it by client-side parsing the excerpt and filtering out any matches where the mark is shorter than some threshold.
v1.0.5-rc2
has been published with the fix for forceLanguage
🙂
Will leave this issue alive til it hits stable.
This has landed in the v1.1.0 release 🎉
Problem
Using the JS API to create an index,
forceLanguage
doesn't seem to have any effect.Repro
Steps:
test.mjs
node test.mjs
Actual behavior:
Expected behavior:
Work-around
Applying the following patch fixes the problem, however, according to the documentation I'd expect to be able to set
forceLanguage
once on the top-level configuration and not have to do it for every document. Perhaps the documentation should be updated or precedence given to the top-level configuration item instead of the document-level value.Context
Pagefind version: 1.0.4