Drekin / win-unicode-console

A Python package to enable Unicode support when running Python from Windows console.
MIT License
103 stars 12 forks source link

UnicodeDecodeError when printing result of glob.glob('xxx') . #29

Closed eromoe closed 8 years ago

eromoe commented 8 years ago

I have a function like this:

import glob

def get_test_file_paths(dirs, ext='*.xls*'):
    file_paths = []
    for d in dirs:
        file_paths.extend(glob.glob(d + ext))
    return file_paths

I use glob to get a paths list which may contain Chinese. Then I got error:

E:\Project\yq_analyze>python merge_e_files.py
data/new\d1.xls
data/new\d2_no_e.xlsx
data/new\negative_2016-02-02@15-11-59.xls
Traceback (most recent call last):
  File "merge_e_files.py", line 66, in <module>
    adf = merge_e_files(file_paths)
  File "merge_e_files.py", line 16, in merge_e_files
    print path
  File "C:\Python27\lib\site-packages\win_unicode_console\streams.py", line 217, in write
    s = s.decode(self.encoding)
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 26: invalid continuation byte
Drekin commented 8 years ago

Can you provide an example of your object path you are trying to print which causes the error? Tell me what it is (e.g. a Python 2 str object containing bytes…) and what it should be (i.e. to which Unicode codepoints it should correspond).

Drekin commented 8 years ago

Basically the problem is that the glob function returns bytes-based str on Python 2 rather than Unicode. But if you provide Unicode, it returns Unicode as well. So use ext=u'*.xsl*' and also provide the names of directories as Unicode. You can also use from __future__ import unicode_literals, so 'something' means u'something' by default.

eromoe commented 8 years ago

Thank you, the problem solved perfectly!