AMQP spec says shortstr fields are sequences of octets but amqplib encode/decodes them

ebin123456 / py-amqplib

Automatically exported from code.google.com/p/py-amqplib

GNU Lesser General Public License v2.1

0 stars 0 forks source link

AMQP spec says shortstr fields are sequences of octets but amqplib encode/decodes them #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Page 34 of AMQP 0.8: "Short strings, used to hold short text properties. Short 
strings are limited 
to 255 octets and can be parsed with no risk of buffer overflows."

amqplib-0.6.1, transport.py:

    def read_shortstr(self):
        """
        Read a utf-8 encoded string that's stored in up to
        255 bytes.  Return it decoded as a Python unicode object.

        """
        self.bitcount = self.bits = 0
        slen = unpack('B', self.input.read(1))[0]
        return self.input.read(slen).decode('utf-8')

I'm not convinced you can assume shortstrs will be encoded at all -- returning 
the appropriate 
str object might be a better fit for the protocol. write_shortstr does the 
right thing, though, by 
passing str objects through unchanged.

Original issue reported on code.google.com by angrybal...@gmail.com on 9 Dec 2009 at 7:54

GoogleCodeExporter commented 9 years ago

For compatability, consider:

def read_shortstr(self, encoding='utf-8'):
    """
    Read a short string that's stored in up to 255 bytes. Return
    it as a decoded Python unicode object, under the encoding
    passed as 'encoding', or as a byte string if 'encoding' is None.
    """
    self.bitcount = self.bits = 0
    slen = unpack('B', self.input.read(1))[0]
    raw = self.input.read(slen)
    if encoding:
        return raw.decode('utf-8')
    return raw

and (a smaller change)

def write_shortstr(self, s, encoding='utf-8'):
    """
    Write a string up to 255 bytes long after encoding.  If passed
    a unicode string, encode as UTF-8 (or the encoding passed as
    'encoding').

    """
    self._flushbits()
    if isinstance(s, unicode):
        s = s.encode(encoding)
    if len(s) > 255:
        raise ValueError('String too long')
    self.write_octet(len(s))
    self.out.write(s)

Original comment by angrybal...@gmail.com on 10 Dec 2009 at 3:42

GoogleCodeExporter commented 9 years ago

*sigh* raw.decode(encoding)

Original comment by angrybal...@gmail.com on 10 Dec 2009 at 3:42

GoogleCodeExporter commented 9 years ago

Hmm, yes there's definitely a mismatch there.  I guess I was sort of thinking 
ahead
to Python3 where text is unicode, and wanted to make sure unicode survived 
roundtrip.
 Does leave non-utf8 8-bit strings out in the dust though.    

The compatibility code looks OK, but right now nothing would ever call those 
methods
and specify a different encoding.  

I suppose it's worth doing to make the serialization.py module a bit cleaner, 
but if
you really wanted to make use of alternate encodings for say exchange names, 
you'd
have to make more changes up at a higher level.

I think I'll mull this over today and probably put in your suggested change 
this evening.

Original comment by barry.pe...@gmail.com on 10 Dec 2009 at 4:37

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I made the change, thanks

Original comment by barry.pe...@gmail.com on 11 Dec 2009 at 5:33

Changed state: Fixed