barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
714 stars 164 forks source link

Is it possible to add other languages dictionaries? #22

Closed marchezinixd closed 6 years ago

marchezinixd commented 6 years ago

Hello,

I'm trying to use this library to fix typos done when adding additional information about a problem, but it's no restricted to english and spanish, we would like portuguese etc. Is there a way for me to download a dict and add it to the library?

marchezinixd commented 6 years ago

Well I found that i can pass a file in the local_dictionary, i tryed to pass one .dic that i download at https://www.karamasoft.com/UltimateSpell/Dictionary.aspx by i get the error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

barrust commented 6 years ago

You can load a new dictionary in, but it is actually more of a word frequency. So instead of it being a list of words (which you can also use by using the load_text or load_text_file functions) it would need to be in JSON format with the key the term and an int as the key or frequency.

A unicode error means that you are likely on python 2.7. Can you confirm which python version you are on?

barrust commented 6 years ago

Also, I am happy to support other dictionaries; PRs are appreciated!

marchezinixd commented 6 years ago

I think we can close this, it's the same problem in https://github.com/barrust/pyspellchecker/issues/21

barrust commented 6 years ago

Sounds good. I can build a Portuguese dictionary using the files from this site if that is what you are looking for. I can make it a supported language at that point. Let me know in a new ticket requesting a language!