barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
713 stars 164 forks source link

Correct a sentense #66

Closed ghost closed 4 years ago

ghost commented 4 years ago

There is known([words]) and unknown([words]). It would be greate to have correct([words]) or correct(sentence). It seems feasible with both, right ?

barrust commented 4 years ago

Sure, one can easily define a function to correct a sentence (or in reality a string of words). The reason the library doesn't do it for you is due to punctuation, string tokenization, and word order.

The library provides a way to provide your own tokenizer that could handle splitting a sentence into words and to deal with punctuation, etc. which can be varied based on text source. See NLTK if you want to learn about or use many different tokenizers. The tokenizer can also help deal with punctuation.

Below is one way you could implement this functionality:

from spellchecker import SpellChecker
sentence = "The large brown dog sleeps all day!"
spell = SpellChecker()

# Note that this does not necessarily deal with punctuation unless you provide
# a custom tokenizer
words = spell.split_words(sentence) 
misspelled = spell.unknown(words)
for word in words:
    correction = spell.correction(word)
    if correction != word:  
         print("{}->{}".format(word, correction))
    else:
         print("No correction found for {}".format(word))    
ghost commented 4 years ago

I tried a logic like yours , but scrumbled orders somehow, so congrats. Then I just correct everything ! probably more heavy on memory. Thanks