Closed GoogleCodeExporter closed 8 years ago
It seems to me that u'\x96' is just not correct Python Unicode string -- could
you
have non-Unicode character in Unicode string?
Original comment by matej.c...@gmail.com
on 8 May 2008 at 10:57
Sorry, I don't fully understand this question. What do you mean by "not
correct"? Do
you have a pointer to something where I can learn more about why these
non-Unicode
characters (in the context of Unicode strings)? We are also interested in
getting to
the bottom of this with gbookmark2delicious project. Thanks.
Original comment by yaa...@gmail.com
on 8 May 2008 at 5:06
After a ton of experimentation, I think I've got it all figured out - one must
use
the 'utf-8' codec instead of the 'iso-8859-1' codec. I advise changing the
default
codec in DeliciousAPI's constructor.
E.g., if you try to post_add something with the string '\xf6', then delicious
misinterprets that and stores the wrong character (if you query it, it gives
you
u'\u2298'). If OTOH you send it the utf-8-encoded string '\xc3\xb6', you'll
get
back the same string.
Original comment by yaa...@gmail.com
on 13 May 2008 at 6:12
Hmmm.. I *think*, the 'encode' is only relevant when someone passes in unicode
strings instead of plain strings to the DeliciousAPI methods.
yaaang: what is your locale encoding?
But the handling in _call_server is not correct. I think the following would be
the
right way to ensure we post plain (byte) strings to del.icio.us:
if isinstance(params[key], unicode):
params[key] = params[key].encode(self.codec)
The thing I am left wondering about is how the server interprets these bytes.
Neither XML nor HTTP headers indicate encoding, presumably XML's default: utf-8.
The elementtree XML parsing always seems to return unicode strings for these...
I work in an UTF-8 environment but what about people using latin-1/ISO-8859-1
encoded
strings in their bookmarks?
With the above code any unicode strings I pass to the instance get handled
correctly:
In [231]: da = pydelicious.DeliciousAPI('mpe', passwd, codec='utf-8')
In [232]: da.posts_add('cid:codec-testing-1@del.icio.us', unicode('★',
'utf-8'),
replace=True)
Out[232]: {'result': (True, 'done')}
In [233]: da.posts_add('cid:codec-testing-2@del.icio.us', '★', replace=True)
Out[233]: {'result': (True, 'done')}
In [234]: for u in 'cid:codec-testing-1@del.icio.us',
'cid:codec-testing-2@del.icio.us': da.posts_get(url=u)
.....:
Out[234]:
{'dt': '2008-06-02',
'posts': [{'description': u'\u2605',
'hash': '15a97870f0707fb9d33496391eac572f',
'href': 'cid:codec-testing-1@del.icio.us',
'others': '',
'shared': 'no',
'tag': 'system:unfiled',
'time': '2008-06-02T15:56:12Z'}],
'tag': '',
'user': 'mpe'}
Out[234]:
{'dt': '2008-06-02',
'posts': [{'description': u'\u2605',
'hash': '5caff95c3d3ea03a7598f300419a3848',
'href': 'cid:codec-testing-2@del.icio.us',
'others': '',
'shared': 'no',
'tag': 'system:unfiled',
'time': '2008-06-02T15:56:25Z'}],
'tag': '',
'user': 'mpe'}
So both have the same result and delicious either uses or recognizes UTF-8.
Original comment by berend.v...@gmail.com
on 2 Jun 2008 at 3:58
Err, which is:
- '★' # plain string: '\xe2\x98\x85'
- unicode('★', 'utf-8') # unicode string: u'\u2605'
Original comment by berend.v...@gmail.com
on 2 Jun 2008 at 4:01
ok. Encoding issues should have been resolved now and commited.
BTW, see tests/test_encodings.py to see encoding/decoding utf8 and latin1 in
action.
Original comment by berend.v...@gmail.com
on 28 Nov 2008 at 3:57
Original issue reported on code.google.com by
yanghate...@gmail.com
on 30 Apr 2008 at 5:37