Crystal03 / google-docs-fs

Automatically exported from code.google.com/p/google-docs-fs
GNU General Public License v2.0
0 stars 0 forks source link

UnicodeDecodeError: 'ascii' codec can't decode byte x in position y #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. gmount /home/user/gdocs <gmail account>
2. cd /home/user/gdocs
3. ls

What is the expected output? What do you see instead?
Expected: my google docs listed instead i see:

ls: reading directory .: Invalid argument

What version of the product are you using? On what operating system?

google-docs-fs-1.0beta2.tar.gz

os: ubuntu 8.0.4.2 / 9.0.4 (Python 2.5 and gdata 1.3.2)

Please provide any additional information below.

tried to get some debug info:
./gFile.py <gmail account> /home/user/gdocs/ -d
Password: 
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
INIT: 7.9
flags=0x0000000b
max_readahead=0x00020000
   INIT: 7.8
   flags=0x00000001
   max_readahead=0x00020000
   max_write=0x00020000
   unique: 1, error: 0 (Success), outsize: 40

# in another shell i cd to my mountpoint...

unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56
   unique: 2, error: 0 (Success), outsize: 112
unique: 3, opcode: GETATTR (3), nodeid: 1, insize: 56
   unique: 3, error: 0 (Success), outsize: 112
unique: 4, opcode: ACCESS (34), nodeid: 1, insize: 48
ACCESS / 01
   unique: 4, error: -38 (Function not implemented), outsize: 16

# then i issue an ls:

unique: 5, opcode: OPENDIR (27), nodeid: 1, insize: 48
   unique: 5, error: 0 (Success), outsize: 32
unique: 6, opcode: GETATTR (3), nodeid: 1, insize: 56
   unique: 6, error: 0 (Success), outsize: 112
unique: 7, opcode: READDIR (28), nodeid: 1, insize: 80
Traceback (most recent call last):
  File "./gFile.py", line 158, in readdir
    self.directories[''].append('%s.%s' % (file.title.text.encode('UTF-8'),
self._file_extension(file)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4:
ordinal not in range(128)
   unique: 7, error: -22 (Invalid argument), outsize: 16
unique: 8, opcode: RELEASEDIR (29), nodeid: 1, insize: 64
   unique: 8, error: 0 (Success), outsize: 16

# debug info was was gathered on ubuntu 8.0.4.2 python 2.5.2 with repo
gdata removed and gdata 1.3.2 installed

Original issue reported on code.google.com by simon.en...@gmail.com on 3 Jun 2009 at 3:48

GoogleCodeExporter commented 9 years ago
May I ask which country you are from? I think that it's possible that you are 
using a
character for a document title that isn't in ASCII, or something along those 
lines.
Are you using accented characters (café for example, with é in it)?
If that is the case, you should be able to workaround this issue by not using
non-ASCII characters. I'll see what I can do to improve matters.

Original comment by d38dm8nw81k1ng@gmail.com on 4 Jun 2009 at 2:56

GoogleCodeExporter commented 9 years ago
I'm from the netherlands. I checked my 187 document titles and they don't
seem to contain non-ascii characters. No characters with accents. A lot of
my document titles do contain minus signs "-" maybe google docs makes these
unicode en dashes?
I will try to remove these characters from my document titles, i will keep you
posted.

Original comment by simon.en...@gmail.com on 4 Jun 2009 at 8:24

GoogleCodeExporter commented 9 years ago
I've checked the dashes in mine and I've had no problem. There's possibly 
something
really subtle in your document titles causing the problem.

Original comment by d38dm8nw81k1ng@gmail.com on 4 Jun 2009 at 10:12

GoogleCodeExporter commented 9 years ago
I've found the solution to the problem. It is related to Python itself using 
ASCII as
the default encoding. You need to edit the sitecustomize.py file to change it to
UTF-8 as follows in Ubuntu:

sudo nano /usr/lib/python2.5/sitecustomize.py

then append the following to the bottom:

import sys
sys.setdefaultencoding('UTF-8')

That should force Python to use UTF-8 instead of ASCII and suppress the errors.
Hope that helps,
Scott W

Original comment by d38dm8nw81k1ng@gmail.com on 5 Jun 2009 at 11:14

GoogleCodeExporter commented 9 years ago
I've found the problem document. It's titled:

Here’s a partial list...

If i replace the unicode U+2019 RIGHT SINGLE QUOTATION MARK with a regular 
apostrophe
the problem does not occur and ls results in a file listing without errors.

I've also tried your workaround, and appended: 

import sys
sys.setdefaultencoding('UTF-8')

to /usr/lib/python2.5/sitecustomize.py

Now an ls results in a complete file listing but with the following errors for 
files
that contain slashes:

ls: cannot access http://mywiki.wooledge.org/ProcessMan....doc: No such file or 
directory
ls: cannot access http://www.debian.org/doc/debian-poli....doc: No such file or 
directory
ls: cannot access diff zr364xx.c /usr/src/linux-source-....doc: No such file or 
directory
...

Thanks for your help and nice work btw!

Original comment by simon.en...@gmail.com on 5 Jun 2009 at 12:33

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Ok, just to clarify, the http://mywiki....doc is the file name of a document 
that you
have stored on Google Docs?
So, if it was mounted on the directory /google, the path would be:
/google/http://mywiki.wooledge.org/ProcessMan....doc
If that is the case, I think the problem is the file name because UNIX systems 
don't
like a / in the file name, due to it being a special character specifying a 
directory.
The file system will actually look for the file ProcessMan....doc in directory
mywiki.wooledge.org in directory http: which it expects to find in the root
directory. I don't think I can code a workaround in it either, unfortunately. 
If I
did, then I would break functionality for people who have a folder http: with 
files
in it (unlikely though it is). The same applies to the diff zr... /usr/... 
because it
would simply be impossible for the file system to determine whether you mean 
the file
to have that name or you were looking in a directory. Using \ to escape it 
doesn't
work either because the Shell sanitises everything before the file system is 
handed
the path. Sorry I can't do any more,
Scott W

Original comment by d38dm8nw81k1ng@gmail.com on 5 Jun 2009 at 1:10

GoogleCodeExporter commented 9 years ago
I've committed a fix to SVN. You should get the latest revision and undo the 
changes
you made to your sitecustomize.py. I'll also change the FAQ to remove that 
advice.
Thanks for your help =)
Scott W

Original comment by d38dm8nw81k1ng@gmail.com on 6 Jun 2009 at 12:29

GoogleCodeExporter commented 9 years ago

Original comment by d38dm8nw81k1ng@gmail.com on 23 Jul 2009 at 8:52