Esukhia / sympound-python

Python version of the SymSpell Compound algorithm
https://pypi.org/project/sympound/
MIT License
12 stars 4 forks source link

Dictionary without frequency count...!? #8

Closed MukhtarShaima closed 5 years ago

MukhtarShaima commented 5 years ago

Does the dictionary without word-freq count works in symspell ? My dict has a unique values,and so i don't have freq count. when i try to work with this dict,the lookup_compound is not giving me any suggestions,it just returns the same value with 'string:0:0' here is the code,

from sympound import sympound

from jellyfish import levenshtein_distance

distancefun = levenshtein_distance ssc = sympound(distancefun=distancefun, maxDictionaryEditDistance=3)

def test(): print(ssc.load_dictionary("symspelldict.txt", term_index=0, count_index=1)) print(ssc.lookup_compound(input_string="سعلوچا", edit_distance_max=3)) test()

ngawangtrinley commented 5 years ago

An easy workaround is to give a placeholder value like 1 to all your entries

MukhtarShaima commented 5 years ago

tried whit that,it gives 'UnicodeDecodeError'.

eroux commented 5 years ago

Is the file you're using available somewhere?

MukhtarShaima commented 5 years ago

Book2.txt

eroux commented 5 years ago

can you try converting it to UTF8? (it's currently UTF16)

MukhtarShaima commented 5 years ago

Book2.txt converted into utf-8

eroux commented 5 years ago

does the converted file still give an error?

MukhtarShaima commented 5 years ago

yes, UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3: character maps to

sorry,it is working.

eroux commented 5 years ago

great! is it ok if I close the issue?

MukhtarShaima commented 5 years ago

I may have some more doubts. my code is running,give me 2 or 3 days.

MukhtarShaima commented 5 years ago

Can anyone explain how symspell algorithm works? Like,if I wanted to calculate the edit distance of LD manual I know what's the way, So how to calculate deletion in symspell or sympound manually,or what's the procedure for calculating manually.

Plus,I didn't got the algorithm.i know there is some deletion instead on insertion and all. But I don't have the proper understanding of the given algorithm. I read the algorithm written by the author on medium.

eroux commented 5 years ago

This repository is a Python version of the algorithm, copied from the original C#. If you want to know more about the theory I think it's best to check the documentation and ask questions on the original repo, by the original author, here

MukhtarShaima commented 5 years ago

Thanks.