Closed Balurc closed 3 years ago
@Balurc thank you for your kind words. I am glad that this project has been helpful!
So I built the same dictionary as you by going through the same process to build the index and dump it as a json object:
input = "id_full.txt"
output = "id_full.json"
word_frequency = dict()
with open(input, "r") as f:
for line in f:
parts = line.split()
word_frequency[parts[0]] = int(parts[1].strip())
with open(output, 'w') as f:
json.dump(word_frequency, f, indent="", sort_keys=True, ensure_ascii=False)
I then ran your code to see if there was something wrong and I too got nothing printed out.
from spellchecker import SpellChecker
spell = SpellChecker(language=None, local_dictionary=output)
misspelled = spell.unknown(['makn', 'minum', 'sakt', 'kemana', 'gila'])
for word in misspelled:
print(spell.correction(word))
print(spell.candidates(word))
So I looked at the json object built to see if what could be wrong and this is what I found:
{
"gila": 30302,
"kemana": 16134,
"makn": 1,
"minum": 20376,
"sakt": 2,
}
So it looks like the words you were testing with made it into the dictionary from the source as actual words.
I have built a script to automate the building of the supported dictionaries (scripts/build_dictioary.py
). It would be possible to add id
to the list of dictionaries but figuring out how to clean up is always the issue. Any help would be appreciated.
@Balurc, was this able to resolve your issue? I am going to close this since for now. If you are still having issues, please re-open with new details.
Hi,
Thank you for such a great work, it helps me a lot. Would it be possible for you to update the resources with Bahasa Indonesia (id)? I have downloaded the text file from here, https://github.com/hermitdave/FrequencyWords/blob/master/content/2018/nl/id_full.txt, and converted it to json.
then i follow below steps: from spellchecker import SpellChecker spell = SpellChecker(language=None) spell.word_frequency.load_dictionary('id_full.json')
then i created a list of some misspelled and correct spelled word in Bahasa Indonesia in here 'makn' is a misspelled word of 'makan', 'skt' --> 'sakit', these are common words in Bahasa misspelled = spell.unknown(['makn', 'minum', 'sakt', 'kemana', 'gila'])
then i run below loop and it prints nothing for word in misspelled: print(spell.correction(word)) print(spell.candidates(word))
Is there something wrong with any of the steps above?