barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
701 stars 103 forks source link

load_words is not prioritized #123

Closed ledikari closed 2 years ago

ledikari commented 2 years ago

Looks like the functionality load_words is not prioritized in the spellchecking.

from spellchecker import SpellChecker

known_words = ['covid', 'Covid19']

spell = SpellChecker(language='en')
spell.word_frequency.load_words(known_words)

word = 'coved'
misspelled = spell.unknown(word)
print(spell.correction(allwords))

the output of this is loved

barrust commented 2 years ago

You are correct, they are "prioritized" based on the number of instances that are found as the more common words are more likely to be the correct word (hence why it is called a frequency). You can help boost the newer words by doing something like this:

from spellchecker import SpellChecker
known_words = ['covid', 'Covid19'] * 1000

spell = SpellChecker(language='en')
spell.word_frequency.load_words(known_words)

Or you could use a different method:

from spellchecker import SpellChecker
known_words = {'covid': 1000, 'Covid19': 10000} 
spell = SpellChecker(language='en')
spell.word_frequency.load_dictionary(known_words)