Open vividvilla opened 7 years ago
I'd be up to tackling this if anyone had some tips on where to start/best practices. Already familiar w/ metaphone and non-english languages.
I'd like to pick this up with some guidance. To show I'm capable & serious, I've implemented Metaphone3 in Golang: Metaphone3 in Go
@snadrus great, unfortunately the repo you linked to appears to be empty.
So this ticket is requesting support for a bleve Analyzer
. Here is what a bleve Analyzer looks like:
https://github.com/blevesearch/bleve/blob/master/analysis/type.go#L74-L80
Essentially, a sequence of character filters are invoked. Then a single tokenizer is invoked turning the []byte into a TokenStream ([]Token). And then a sequence of token filters are invoked which can add/modify/delete the individual tokens.
As I can't see how the metaphone3 works, my best guess is that you should start by creating a token filter. The token filter will range over each token in the stream. At this point you pass the token text to metaphone3, get something back, and turn this output into either new or modified tokens.
If you have more questions let me know.
Oops, I pushed it now. Thanks for the guidance, but here's my next problem:
m := metaphone3.New()
primary, alternate := m.Encode("choch")
Given this, I now have 3 tokens in order of value:
How can I express that all 3 tokens apply, but apply greater significance to an exact match?
So, there isn't a way to explicitly do that, but I suggest we ignore that temporarily. Let's get it working first, and then we review it doesn't work well in some cases, or there are cases we can improve.
To start, just have the token filter emit all 3 tokens, and use the same "position" as the original term. That should allow phrases to match correctly as well.
Was this every completed?
No, but you're welcome to do so. I no longer need this.
It would be nice to have phonetics token filters. Metaphone and Double Metaphone are the most popular algorithms available now. Here is a list of Go libraries which implement it