Closed MukhtarShaima closed 5 years ago
An easy workaround is to give a placeholder value like 1 to all your entries
tried whit that,it gives 'UnicodeDecodeError'.
Is the file you're using available somewhere?
can you try converting it to UTF8? (it's currently UTF16)
Book2.txt converted into utf-8
does the converted file still give an error?
yes,
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3: character maps to
sorry,it is working.
great! is it ok if I close the issue?
I may have some more doubts. my code is running,give me 2 or 3 days.
Can anyone explain how symspell algorithm works? Like,if I wanted to calculate the edit distance of LD manual I know what's the way, So how to calculate deletion in symspell or sympound manually,or what's the procedure for calculating manually.
Plus,I didn't got the algorithm.i know there is some deletion instead on insertion and all. But I don't have the proper understanding of the given algorithm. I read the algorithm written by the author on medium.
This repository is a Python version of the algorithm, copied from the original C#. If you want to know more about the theory I think it's best to check the documentation and ask questions on the original repo, by the original author, here
Thanks.
Does the dictionary without word-freq count works in symspell ? My dict has a unique values,and so i don't have freq count. when i try to work with this dict,the lookup_compound is not giving me any suggestions,it just returns the same value with 'string:0:0' here is the code,
from sympound import sympound
from jellyfish import levenshtein_distance
distancefun = levenshtein_distance ssc = sympound(distancefun=distancefun, maxDictionaryEditDistance=3)
def test(): print(ssc.load_dictionary("symspelldict.txt", term_index=0, count_index=1)) print(ssc.lookup_compound(input_string="سعلوچا", edit_distance_max=3)) test()