barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
710 stars 164 forks source link

Preserve case of letters after correction #50

Closed hemanta212 closed 5 years ago

hemanta212 commented 5 years ago

The case of letters changes after the correction by spellchecker.unknown() function. In my use case I need to point out the line numbers of the words that were mistaken. I use spellchecker.unknown() to find words that were mistyped and search line numbers of these words but since

spellchecker.unknown(['Thankk']) 

will return thankk ( lowercasing the first letter). It is difficult to point out the line number. Would it be feasible to preserve the Case of letters?

barrust commented 5 years ago

To reduce the complexity of checking upper case letters, pyspellchecker defaults to lower case. You can force it to maintain upper case but that requires that you build your own dictionary. If you were, you would have to have all forms (upper case and lower case) of each word to be able to correct thankk and Thankk. For example:

Thank
thank
Thanks
thanks

I would recommend that you track the location of the word prior to submitting it for correction, in your case, before passing it to pyspellchecker instead of having pyspellchecker return the input in the same case it was provided.

hemanta212 commented 5 years ago

After thinking about it, a spelling mistake is a mistake so I think I will match line no by converting all words in text file in lower case instead. Although I won't be able to report the errors in same case as user caused. Anyway thanks a lot for quick response.