barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
701 stars 103 forks source link

case_sensitive=True gives unexpected results #122

Closed nschloe closed 2 years ago

nschloe commented 2 years ago

In a case-sensitive dictionary, I would expect 'FBI' to be known and 'fbi' to be unknown. However, both cases give me 'fbi' as known:

from spellchecker import SpellChecker

spell = SpellChecker(case_sensitive=True)

print(spell.known(["FBI"]))
print(spell.known(["fbi"]))
{'fbi'}
{'fbi'}
akhmerov commented 2 years ago

That's because the language is set (en by default), and case_sensitive is ignored if language is set (as per the docstring).

nschloe commented 2 years ago

Thanks for the reply! Is there way to get

{'FBI'}
{}

from the above code at all? (If language=None, both seem to be ignored.)

barrust commented 2 years ago

That is likely because you didn't add a dictionary. What dictionary did you add?

Can you try something like this? This should work, I am not at a location to run it myself to verify no typos!

from spellchecker import SpellChecker

spell = SpellChecker(language=None, case_sensitive=True) 
spell.word_frequency.add("FBI") 

print("FBI" in spell)
print("fbi" in spell)
nschloe commented 2 years ago

Ah wait, when using language=None, it actually only spellchecks words that I put in manually? That's not good enough for me. Is there no way to use a case-sensitive English dictionary?

barrust commented 2 years ago

To use a case_sensitive dictinoary, you will need to build it yourself as the default dictionaries are case-insensitive. There are lots of ways to build dictionaries, and they are not manually. I only used that to ensure that there wasn't a bug. You can find the different ways to build a custom dictionary in the documentation on building a new dictionary or in the GitHub Discussion #90.

Either way, there are reasons why the default dictionaries are not capitalized:

Just some thoughts on it; good luck!

nschloe commented 2 years ago

Thanks for the info!