Closed GoogleCodeExporter closed 9 years ago
For compatability, consider:
def read_shortstr(self, encoding='utf-8'):
"""
Read a short string that's stored in up to 255 bytes. Return
it as a decoded Python unicode object, under the encoding
passed as 'encoding', or as a byte string if 'encoding' is None.
"""
self.bitcount = self.bits = 0
slen = unpack('B', self.input.read(1))[0]
raw = self.input.read(slen)
if encoding:
return raw.decode('utf-8')
return raw
and (a smaller change)
def write_shortstr(self, s, encoding='utf-8'):
"""
Write a string up to 255 bytes long after encoding. If passed
a unicode string, encode as UTF-8 (or the encoding passed as
'encoding').
"""
self._flushbits()
if isinstance(s, unicode):
s = s.encode(encoding)
if len(s) > 255:
raise ValueError('String too long')
self.write_octet(len(s))
self.out.write(s)
Original comment by angrybal...@gmail.com
on 10 Dec 2009 at 3:42
*sigh* raw.decode(encoding)
Original comment by angrybal...@gmail.com
on 10 Dec 2009 at 3:42
Hmm, yes there's definitely a mismatch there. I guess I was sort of thinking
ahead
to Python3 where text is unicode, and wanted to make sure unicode survived
roundtrip.
Does leave non-utf8 8-bit strings out in the dust though.
The compatibility code looks OK, but right now nothing would ever call those
methods
and specify a different encoding.
I suppose it's worth doing to make the serialization.py module a bit cleaner,
but if
you really wanted to make use of alternate encodings for say exchange names,
you'd
have to make more changes up at a higher level.
I think I'll mull this over today and probably put in your suggested change
this evening.
Original comment by barry.pe...@gmail.com
on 10 Dec 2009 at 4:37
I made the change, thanks
Original comment by barry.pe...@gmail.com
on 11 Dec 2009 at 5:33
Original issue reported on code.google.com by
angrybal...@gmail.com
on 9 Dec 2009 at 7:54