Open abhijeetvramgir opened 8 years ago
I'm doing this for: a) strip HTML b) transliteration and c) strip punctuation:
preprocess: function(content) {
const tr = (str) => {
const map = {"а":"a" /* truncated for diff */ };
let new_str = "", char, substitute, n = str.length;
for(let i = 0; i < n; i++) {
char = str[i]; substitute = map[char]; new_str += substitute ? substitute : char;
}
return new_str;
};
return tr(
content.replace(/<[^>]+>/g, ' ') // Strip HTML
) // Transliterate foreign characters
.replace(/[^\w]/g, ' ') // Strip Punctuation
;
}
That seems to remove the HTML and punctuation from the contents; however, I think some punctuation is still getting through to the index in other fields. Is that right?
This is my lunr snippet from the build file:
How do I strip HTML tags ??