Closed jpfairbanks closed 6 years ago
I thought everything was in uft-8.
I get an error with file.readlines()
Traceback (most recent call last):
File "reflists.py", line 21, in <module>
print(json.dumps(wordsets, indent=2))
File "/usr/local/lib/python2.7/json/__init__.py", line 251, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/local/lib/python2.7/json/encoder.py", line 209, in encode
chunks = list(chunks)
File "/usr/local/lib/python2.7/json/encoder.py", line 434, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "/usr/local/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "/usr/local/lib/python2.7/json/encoder.py", line 313, in _iterencode_list
yield buf + _encoder(value)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xde in position 2: invalid continuation byte
It is latin-1.... 😡
this is how you fix it. iconv -f ISO8859-1 -t UTF8
corrected in latest pull, yes? we can close this issue, I think @jpfairbanks , @scottagt
If its been fixed / committed then sounds good
In the file
ref_lexicons/vader_words
there are some emoticons that have encoding problems:For example:
:-Þ
What encoding is this file in?
cc: @cjhutto