Esukhia / sympound-python

Python version of the SymSpell Compound algorithm
https://pypi.org/project/sympound/
MIT License
12 stars 4 forks source link

Support for multiple tokens? #4

Open mollerhoj opened 6 years ago

mollerhoj commented 6 years ago

As far as I can tell, this does not support lookup of sentences?

from sympound import sympound
from pyxdameraulevenshtein import damerau_levenshtein_distance
distancefun = damerau_levenshtein_distance

ssc = sympound(distancefun=distancefun, maxDictionaryEditDistance=3)

def test():
    print(ssc.load_dictionary("frequency_dictionary_en_82_765.txt", term_index=0, count_index=1))
    print(ssc.lookup_compound(input_string="whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixthgrade and ins pired him", edit_distance_max=3))
test()
# returns:
# True
# wherewith:202893:88

(This is using the example data and sentence from the official SymSpell.cs repo)

eroux commented 6 years ago

The code reproduces the C# code, at least the core system. So either there is a bug somewhere or this is a functionality that is in another C# file that hasn't been ported... taking a closer look, I think the Python code is missing this function. Do you have some experience with Python and can you make a pull request? This shouldn't bee too difficult I think (the variables are named in a similar way between the Python and C# codes).

MukhtarShaima commented 5 years ago

Does the dictionary without word-freq count works in symspell ? My dict has a unique values,and so i don't have freq count. when i try to work with this dict,the lookup_compound is not giving me any suggestions,it just returns the same value with 'string:0:0' here is the code,

from sympound import sympound

from jellyfish import levenshtein_distance

distancefun = levenshtein_distance ssc = sympound(distancefun=distancefun, maxDictionaryEditDistance=3)

def test(): print(ssc.load_dictionary("symspelldict.txt", term_index=0, count_index=1)) print(ssc.lookup_compound(input_string="سعلوچا", edit_distance_max=3)) test()

eroux commented 5 years ago

@MukhtarShaima can you please open a separate issue for that? It doesn't look related to this one...