OpenSextant / SolrTextTagger

A text tagger based on Lucene / Solr, using FST technology
Apache License 2.0
173 stars 37 forks source link

implementing fuzzy matching #70

Closed navd closed 6 years ago

navd commented 7 years ago

Hi @dsmiley, I am using SolrTextTagger with SODA and trying to implement fuzzy matching for this. Earlier, you said you had some idea about implementing it. Can you provide me some guidance regarding the same, I feel bit lost.

Thanks, Navdeep

dsmiley commented 7 years ago

Hi, Can you provide me a URL of my response; my memory is hazy :-) I think fuzzy matching is possible but probably very hard. At least the most likely solution in my head would be Lucene's NRT Document Suggester which can do fuzzy matches, among other things. It's internally based on an FST too, so it has the memory characteristics we want. But that amounts to a rewrite of the guts of the tagger, which is already complicated code that requires expert Lucene knowledge. Such a change would definitely amount to a new major version number. Sorry I can't offer much assistance on exactly how to do this.

matthewgertner commented 7 years ago

I wonder if https://docs.rs/fst-levenshtein/0.1.0/fst_levenshtein/struct.Levenshtein.html would be useful.