datamade / parserator

:bookmark: A toolkit for making domain-specific probabilistic parsers
http://parserator.datamade.us
MIT License
797 stars 82 forks source link

Write labeling file with proper encoding #19

Closed vierja closed 9 years ago

vierja commented 9 years ago

When using $ parserator label it was failing when writing the file with the following error:

Done! Yay!
Traceback (most recent call last):
  File "/Users/javier/.virtualenvs/meli_parser/bin/parserator", line 9, in <module>
    load_entry_point('parserator==0.3.6', 'console_scripts', 'parserator')()
  File "/Users/javier/.virtualenvs/meli_parser/lib/python2.7/site-packages/parserator/main.py", line 37, in dispatch
    args.func(args)
  File "/Users/javier/.virtualenvs/meli_parser/lib/python2.7/site-packages/parserator/main.py", line 45, in label
    manual_labeling.label(module, infile_path, outfile_path)
  File "/Users/javier/.virtualenvs/meli_parser/lib/python2.7/site-packages/parserator/manual_labeling.py", line 216, in label
    data_prep_utils.list2file(raw_strings_left, unlabeled_dir+'unlabeled_'+file_slug+'.csv')
  File "/Users/javier/.virtualenvs/meli_parser/lib/python2.7/site-packages/parserator/data_prep_utils.py", line 77, in list2file
    file.write('"%s"\n' % string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 45: ordinal not in range(128)
(meli_parser)

I noticed the original file is read using .decode('utf-8') but when writing it was missing the proper encoding.

cathydeng commented 9 years ago

thanks for catching this, @vierja! :smiley: