The behavior of iterating over a bytestring differs between python2 and python3:
Python 3.7.4 (default, Jul 16 2019, 07:12:58)
>>> f = b'asdf1234'
>>> [x for x in f]
[97, 115, 100, 102, 49, 50, 51, 52]
Python 2.7.16 (default, Mar 11 2019, 18:59:25)
>>> f = b'asdf1234'
>>> [x for x in f]
['a', 's', 'd', 'f', '1', '2', '3', '4']
Problem
Reproduced the experienced failure around unpacking the string_ext field type. The failure is due to the above when translating the "packed" bytestring into a list of ints (based on the spec). The reproduction test fails (without the fixes in _unpacker) as:
py/tests/test_string.py:17:
def test_string_ext_unpack():
> assert unpack(b'\x83k\x0b\x00hello world') == \
[104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]
py/erlpack/_unpacker.pyx:30: in erlpack._unpacker.ErlangTermDecoder.loads
return self.decode_part(bytes, offset + 1)[0]
py/erlpack/_unpacker.pyx:72: in erlpack._unpacker.ErlangTermDecoder.decode_part
return self.decode_k(bytes, offset + 1)
py/erlpack/_unpacker.pyx:200:
> st = [ord(x) for x in st]
E TypeError: ord() expected string of length 1, but int found
Solution
Using a behavior check for determining the handling (byte_elements_are_ints = isinstance(b'a'[0], int)) allows for modifying the translation behavior based on the underlying iterator behavior (so this avoids trying to assert based on system versions). There may be some speed optimizations around this and using this check as a global detector or wrapping the translation into a private function, but this is the least intrusive solution.
Background
The behavior of iterating over a bytestring differs between python2 and python3:
Problem
Reproduced the experienced failure around unpacking the
string_ext
field type. The failure is due to the above when translating the "packed" bytestring into a list of ints (based on the spec). The reproduction test fails (without the fixes in _unpacker) as:Solution
Using a behavior check for determining the handling (
byte_elements_are_ints = isinstance(b'a'[0], int)
) allows for modifying the translation behavior based on the underlying iterator behavior (so this avoids trying to assert based on system versions). There may be some speed optimizations around this and using this check as a global detector or wrapping the translation into a private function, but this is the least intrusive solution.