Invalid characters indexed

bleroy / lunr-core

A port of LUNR.js to .NET Core

MIT License

565 stars 24 forks source link

Invalid characters indexed #3

Closed xoofx closed 4 years ago

xoofx commented 4 years ago

It seems that characters / (e.g doc/advanc) or ( (e.g bin(out) are indexed with surrounding word characters.

xoofx commented 4 years ago

I saw that it's the same in lunr.js (as you did a close port of it)

Using just space or hyphen seems so wrong to separate words, but maybe they expect a string to be squashed to these only? (so any other characters would have to be replaced by spaces before indexing?)

bleroy commented 4 years ago

Can you write down a few cases of the string to index, what it gets tokenized as, and what you'd expect?

xoofx commented 4 years ago

Closing as lunr is expecting only words and spaces/hyphen, I can workaround it on my side.