Closed stevenicholls99 closed 6 years ago
This isn't a part of the codebase that I'm super familiar with, but you might be able to hack together a solution by modifying the list comprehension in parserator that's failing. I'd try something like:
strings = set(row[0].decode('utf-8') for row in reader)
The more correct thing to do here would be to optionally use backports.csv if we are using python 2. https://pypi.python.org/pypi/backports.csv
I am trying to build a parser for JP hence the training data is saved UTF-8. However the parserator throws a UnicodeDecodeError. Is there anything I can do to work around this? newaddr.csv attached - saved as .txt newaddr.txt
parserator label training/newaddr.csv training/newaddr.xml usaddress Traceback (most recent call last): File "c:\python27\lib\runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "c:\python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "c:\Python27\scripts\parserator.exe__main__.py", line 9, in
File "c:\python27\lib\site-packages\parserator\main.py", line 58, in dispatch
args.func(args)
File "c:\python27\lib\site-packages\parserator\main.py", line 79, in label
manual_labeling.label(module, infile_path, outfile_path)
File "c:\python27\lib\site-packages\parserator\manual_labeling.py", line 207, in label
strings = set(row[0] for row in reader)
File "c:\python27\lib\site-packages\parserator\manual_labeling.py", line 207, in
strings = set(row[0] for row in reader)
File "c:\python27\lib\site-packages\backports\csv.py", line 394, in next
lineobj = next(self.input_iter)
File "c:\python27\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4: character maps to