filyp / autocorrect

Spelling corrector in python
GNU Lesser General Public License v3.0
447 stars 79 forks source link

Adding a new language does not work #16

Closed Garve closed 3 years ago

Garve commented 3 years ago

Hi!

I tried to follow the explanation on ading new languages.

count_words('hiwiki-latest-pages-articles.xml', 'hi')

does not work for me. It says

~\Miniconda3\lib\site-packages\autocorrect\word_count.py in count_words(src_filename, lang, encd, out_filename)
     17 def count_words(src_filename, lang, encd=None, out_filename='word_count.json'):
     18     words = get_words(src_filename, lang, encd)
---> 19     counts = Counter(words)
     20     # make output file human readable
     21     counts_list = list(counts.items())

~\Miniconda3\lib\collections\__init__.py in __init__(*args, **kwds)
    566             raise TypeError('expected at most 1 arguments, got %d' % len(args))
    567         super(Counter, self).__init__()
--> 568         self.update(*args, **kwds)
    569 
    570     def __missing__(self, key):

~\Miniconda3\lib\collections\__init__.py in update(*args, **kwds)
    653                     super(Counter, self).update(iterable) # fast path when counter is empty
    654             else:
--> 655                 _count_elements(self, iterable)
    656         if kwds:
    657             self.update(kwds)

~\Miniconda3\lib\site-packages\autocorrect\word_count.py in get_words(filename, lang, encd)
      7 
      8 def get_words(filename, lang, encd):
----> 9     word_regex = word_regexes[lang]
     10     capitalized_regex = r'(\.|^|<|"|\'|\(|\[|\{)\s*' + word_regexes[lang]
     11     with open(filename, encoding=encd) as file:

KeyError: 'hi'

Best regards Robert

filyp commented 3 years ago

Hi, You need to first add special letters in autocorrect/constants.py.

Garve commented 3 years ago

Thanks!