GoogleCloudPlatform / appengine-gcs-client

App Engine-Cloud Storage custom client library
Apache License 2.0
124 stars 112 forks source link

Inconsistent handling of unicode for open / listbucket / delete #39

Open davidwtbuxton opened 8 years ago

davidwtbuxton commented 8 years ago

Hi,

The cloudstorage.listbucket(..) gives you GCSFileStat objects, which will decode UTF-8 encoded object names for you so that GCSFileStat.filename is a unicode instance. This is nice.

But passing a unicode instance to the open or delete functions gives you a KeyError if the string includes non-ASCII characters.

Traceback (most recent call last):
  File "/base/data/home/apps/e~davidwtbuxton-test/cloudstorage-utf8-bug.394332959138233059/bottle.py", line 862, in _handle
    return route.call(**args)
  File "/base/data/home/apps/e~davidwtbuxton-test/cloudstorage-utf8-bug.394332959138233059/bottle.py", line 1732, in wrapper
    rv = callback(*a, **ka)
  File "/base/data/home/apps/e~davidwtbuxton-test/cloudstorage-utf8-bug.394332959138233059/wsgi.py", line 32, in create_utf8
    return create_file(u'Señor') #.encode('utf-8'))
  File "/base/data/home/apps/e~davidwtbuxton-test/cloudstorage-utf8-bug.394332959138233059/wsgi.py", line 38, in create_file
    with cloudstorage.open(dest, 'w') as fh:
  File "/base/data/home/apps/e~davidwtbuxton-test/cloudstorage-utf8-bug.394332959138233059/cloudstorage/cloudstorage_api.py", line 91, in open
    filename = api_utils._quote_filename(filename)
  File "/base/data/home/apps/e~davidwtbuxton-test/cloudstorage-utf8-bug.394332959138233059/cloudstorage/api_utils.py", line 94, in _quote_filename
    return urllib.quote(filename)
  File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/urllib.py", line 1263, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xf1'

It would be nice if the cloudstorage library automatically encoded unicode object names to UTF-8, as well as decoding them.

For example, in this test project which creates objects with UTF-8 encoded names, the filename has to be encoded again when deleting all objects in a bucket.

Thank you,

David B.