Closed derplakatankleber closed 3 years ago
I also noticed this.
In #66 the approach of replacing:
lunr.de.wordCharacters = "A-Za-züÜÄäÖöß0-9";
Fixes wildcard support.
I also wound up changing approaches. I can dig up my code, but I believe what I did was:
ae
, ue
versionsHere it is: I basically create a mirror search index without international characters so the user gets success if they use ü
or u
// receive a set of text and replace diacritics
// it's a poor man's multi-lingual
function normalizeText(searchIndex) {
function replaceCharacters(string) {
var string = string || "";
// handle some common international string as fuzzy english
string = string.replace(/\u00c4/g, "A");
string = string.replace(/\u00dc/g, "U");
string = string.replace(/\u00d6/g, "O");
string = string.replace(/\u00fc/g, "u");
string = string.replace(/\u00e4/g, "a");
string = string.replace(/\u00f6/g, "o");
string = string.replace(/\u00df/g, "s");
string = string.replace(/ae/g, "a");
string = string.replace(/ue/g, "u");
string = string.replace(/oe/g, "o");
string = string.replace(/ss/g, "s");
string = string.replace(/á/g, "a");
return string;
}
for (const item in searchIndex) {
if (Object.hasOwnProperty.call(searchIndex, item)) {
searchIndex[item].multilingualAlternate = replaceCharacters(searchIndex[item].lastName);
searchIndex[item].multilingualAlternate += " " + replaceCharacters(searchIndex[item].firstName);
}
}
return searchIndex;
}
I'm sure it's terrible for performance, but for our use case the dataset was small enough that it didn't matter.
@khawkins98 Thank you very much for the quick answer and your new workaround!
I tried your demosite "demo-browser-require.html", but I don't understand the results.
tests: console.log('Search for
günstige
: ', idx.search('günstige'));// expected resultsize: 1, result: 1 console.log('Search forgünstig*
: ', idx.search('günstig*'));// expected resultsize: 1, result: 0 console.log('Search forg*nstig*
: ', idx.search('g*nstig*'));// expected resultsize: 1, result: 1source: https://rawgit.com/MihaiValentin/lunr-languages/master/demos/demo-browser-require.html
Did I missunderstood, how to search for words with umlauts, or is it not possible to search with wildcards for words with umlauts?