discord / erlpack

High Performance Erlang Term Format Packer
MIT License
217 stars 64 forks source link

[py3] Fix string_ext unpacking #22

Closed jdost closed 4 years ago

jdost commented 4 years ago

Background

The behavior of iterating over a bytestring differs between python2 and python3:

Python 3.7.4 (default, Jul 16 2019, 07:12:58) 
>>> f = b'asdf1234'
>>> [x for x in f]
[97, 115, 100, 102, 49, 50, 51, 52]
Python 2.7.16 (default, Mar 11 2019, 18:59:25) 
>>> f = b'asdf1234'
>>> [x for x in f]
['a', 's', 'd', 'f', '1', '2', '3', '4']

Problem

Reproduced the experienced failure around unpacking the string_ext field type. The failure is due to the above when translating the "packed" bytestring into a list of ints (based on the spec). The reproduction test fails (without the fixes in _unpacker) as:

py/tests/test_string.py:17: 
    def test_string_ext_unpack():
>       assert unpack(b'\x83k\x0b\x00hello world') == \
            [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]

py/erlpack/_unpacker.pyx:30: in erlpack._unpacker.ErlangTermDecoder.loads
    return self.decode_part(bytes, offset + 1)[0]
py/erlpack/_unpacker.pyx:72: in erlpack._unpacker.ErlangTermDecoder.decode_part
    return self.decode_k(bytes, offset + 1)
py/erlpack/_unpacker.pyx:200:
>   st = [ord(x) for x in st]

E   TypeError: ord() expected string of length 1, but int found

Solution

Using a behavior check for determining the handling (byte_elements_are_ints = isinstance(b'a'[0], int)) allows for modifying the translation behavior based on the underlying iterator behavior (so this avoids trying to assert based on system versions). There may be some speed optimizations around this and using this check as a global detector or wrapping the translation into a private function, but this is the least intrusive solution.