PyFilesystem / pyfilesystem

Python filesystem abstraction layer
http://pyfilesystem.org/
BSD 3-Clause "New" or "Revised" License
287 stars 63 forks source link

OSFS: Issue with non-ascii filenames in listdirinfo() #250

Closed zopyx closed 8 years ago

zopyx commented 8 years ago

fs 0.5.4, Python 2.7, running on OpenSuse, EXT4

I have a local fs /tmp/test with one file named 'ä'. listdir() always works, listdirinfo() always fails independent of the encoding parameter being utf8 or iso-8859-15

from fs.osfs import OSFS
import pprint

l = OSFS('/tmp/test', encoding='utf8')
print l.listdir()
print l.listdirinfo()
bin/zopepy xx.py 
[u'\xe4']
Traceback (most recent call last):
  File "bin/zopepy", line 326, in <module>
    exec(compile(__file__f.read(), __file__, "exec"))
  File "xx.py", line 6, in <module>
    print l.listdirinfo()
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/base.py", line 540, in listdirinfo
    files_only=files_only)]
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/base.py", line 530, in getinfo
    return self.getinfo(pathjoin(path, p))
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/errors.py", line 257, in wrapper
    return func(self,*args,**kwds)
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/osfs/__init__.py", line 366, in getinfo
    stats = self._stat(path)
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/osfs/__init__.py", line 360, in _stat
    return _os_stat(sys_path)
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/errors.py", line 257, in wrapper
    return func(self,*args,**kwds)
  File "/home/ajung/.buildout/eggs/fs-0.5.4-py2.7.egg/fs/osfs/__init__.py", line 47, in _os_stat
    return os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128)
lurch commented 8 years ago

Just tried reproducing this with fs 0.5.4 and Python 2.7 on Ubuntu 14.04, and listdir() and listdirinfo() both seem to work equally well. Does it make any difference if you use the 'regular' python interpreter, rather than whatever 'zopepy' is?

#!/usr/bin/env python
# coding=utf8
import os, shutil
from fs.osfs import OSFS

test_dir = '/tmp/test'
test_file = u'ä'

if os.path.exists(test_dir):
    shutil.rmtree(test_dir)
os.mkdir(test_dir)
with open(os.path.join(test_dir, test_file), 'w') as test:
    test.write('test')
print os.listdir(test_dir)

for encoding in (None, 'utf8', 'iso-8859-15'):
    with OSFS(test_dir, encoding=encoding) as l:
        print l.listdir()
        print l.listdirinfo()
$ python xx.py
['\xc3\xa4']
[u'\xe4']
[(u'\xe4', {'st_ctime': 1461767787.88804, 'st_rdev': 0, 'st_mtime': 1461767787.88804, 'st_blocks': 8, 'st_nlink': 1, 'modified_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'st_gid': 1001, 'st_blksize': 4096, 'created_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'st_dev': 2052L, 'st_size': 4, 'st_atime': 1461767787.88804, 'st_uid': 1001, 'st_ino': 13383544, 'st_mode': 33204, 'accessed_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'size': 4})]
[u'\xe4']
[(u'\xe4', {'st_ctime': 1461767787.88804, 'st_rdev': 0, 'st_mtime': 1461767787.88804, 'st_blocks': 8, 'st_nlink': 1, 'modified_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'st_gid': 1001, 'st_blksize': 4096, 'created_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'st_dev': 2052L, 'st_size': 4, 'st_atime': 1461767787.88804, 'st_uid': 1001, 'st_ino': 13383544, 'st_mode': 33204, 'accessed_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'size': 4})]
[u'\xe4']
[(u'\xe4', {'st_ctime': 1461767787.88804, 'st_rdev': 0, 'st_mtime': 1461767787.88804, 'st_blocks': 8, 'st_nlink': 1, 'modified_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'st_gid': 1001, 'st_blksize': 4096, 'created_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'st_dev': 2052L, 'st_size': 4, 'st_atime': 1461767787.88804, 'st_uid': 1001, 'st_ino': 13383544, 'st_mode': 33204, 'accessed_time': datetime.datetime(2016, 4, 27, 15, 36, 27, 888040), 'size': 4})]
zopyx commented 8 years ago

I can not execute your test script on CentOS 7.1

bin/python out.py 
Traceback (most recent call last):
  File "out.py", line 12, in <module>
    with open(os.path.join(test_dir, test_file), 'w') as test:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128)

..but working on OpenSuse 42.1

bin/python out.py 
['\xc3\xa4']
[u'\xe4']
[(u'\xe4', {'st_ctime': 1463481524.072729, 'st_rdev': 0, 'st_mtime': 1463481524.072729, 'st_blocks': 8, 'st_nlink': 1, 'modified_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'st_gid': 100, 'st_blksize': 4096, 'created_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'st_dev': 52L, 'st_size': 4, 'st_atime': 1463481524.072729, 'st_uid': 1000, 'st_ino': 692846, 'st_mode': 33152, 'accessed_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'size': 4})]
[u'\xe4']
[(u'\xe4', {'st_ctime': 1463481524.072729, 'st_rdev': 0, 'st_mtime': 1463481524.072729, 'st_blocks': 8, 'st_nlink': 1, 'modified_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'st_gid': 100, 'st_blksize': 4096, 'created_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'st_dev': 52L, 'st_size': 4, 'st_atime': 1463481524.072729, 'st_uid': 1000, 'st_ino': 692846, 'st_mode': 33152, 'accessed_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'size': 4})]
[u'\xe4']
[(u'\xe4', {'st_ctime': 1463481524.072729, 'st_rdev': 0, 'st_mtime': 1463481524.072729, 'st_blocks': 8, 'st_nlink': 1, 'modified_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'st_gid': 100, 'st_blksize': 4096, 'created_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'st_dev': 52L, 'st_size': 4, 'st_atime': 1463481524.072729, 'st_uid': 1000, 'st_ino': 692846, 'st_mode': 33152, 'accessed_time': datetime.datetime(2016, 5, 17, 12, 38, 44, 72729), 'size': 4})]
zopyx commented 8 years ago

CentOS 7.1 has a strange filesystemencoding:

>>> sys.getfilesystemencoding()
'ANSI_X3.4-1968'
zopyx commented 8 years ago

Setting

export LC_ALL=en_US.UTF-8

solves the problem

lurch commented 8 years ago

Does setting LC_ALL fix your original problem too? Can this issue be closed?

zopyx commented 8 years ago

Yes, closing...

lurch commented 8 years ago

Thanks for confirming :-)