Open georgeee opened 3 years ago
Here is the source of the bug: https://github.com/algorand/go-algorand/blob/3fce706479573bcee6c5cadd5465cd32e7175328/data/basics/teal.go#L46
Need to do some initial investigation on this before we jump into implementing a solution.
Let me chime in to, hopefully, clarify what is happening, at least the way I understand it.
In Go there are two, essentially, equivalent types: string
and []byte
– the only difference between them is that the former is immutable. That is in contrast to other modern programming languages where “string” means a sequence of Unicode codepoints rather than bytes.
I don’t have a lot of experience with Go, but, I think, there is no universal agreement that e.g. string
should only ever be used for valid UTF-8 encodings, although that kind of seems to be the intent of the creators of the language.
Now, in MessagePack, there are two types: String
(and its corresponding format str
) and Binary
(and its corresponding format bin
). In line with the interpretation of higher-level languages, the MessagePack specification requires String
to be a valid UTF-8 encoding of a Unicode string.
The final component of the issue is the go-codec
library, which, by default, maps Go string
to MessagePack str
(and Go []byte
to MessagePack bin
). The first part of this mapping is incorrect in general, since in Go string
can contain arbitrary bytes, but in MessagePack str
cannot.
So, there are two ways to fix this situation:
[]byte
instead of string
at all places where arbitrary bytes, rather than correct UTF-8-encoded codepoints, can be found.string
, but set StringToRaw = true
in the codec configuration (however, if there are places in the protocol where genuine strings are used, they will be serialised to bin
now; not a big deal and, I believe, the only reason to prefer str
over bin
where possible is that its serialisation is more compact).Either way, current serialisation procedure is, unfortunately, not compliant with the specification and will definitely cause problems for standard-compliant decoders in languages that have a “real” string type. E.g. the transaction linked in the first comment cannot be decoded by py-algorand-sdk
, since Python’s msgpack
will fail to decode arbitrary bytes as a string:
>>> req = urllib.request.Request('https://api.betanet.algoexplorer.io/v2/blocks/9414973?format=msgpack', headers={'User-Agent': 'Not Python'})
>>> msgpack.unpack(urllib.request.urlopen(req))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 129, in unpackb
ret = unpacker._unpack()
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 644, in _unpack
ret.append(self._unpack(EX_CONSTRUCT))
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/home/kirelagin/proj/serokell/py-algorand-sdk/venv/lib/python3.8/site-packages/msgpack/fallback.py", line 666, in _unpack
raise ValueError(
ValueError: <class 'int'> is not allowed for map key
And either way, fixing this means breaking the wire format compatibility for Algorand.
(This might actually be worth filing an issue in go-crypt
to change StringToRaw
to true
by default, because this way people will, at least, be aware of the caveat and it will not produce potentially incorrect encodings with default settings.)
Subject of the issue
Hi! I attached msgpack encoding of block
9414973
from betanet (retrieved via url). It doesn't meet msgpack specification.It encodes some address field of a transaction as string. However specification of messagepack clearly says that this encoding is for UTF-8 strings, not bytestring.
Attachments
file.zip
Your environment
N/A
Steps to reproduce
Try parsing attached file with some msgpack parser (e.g. this Haskell library).
Expected behaviour
File is parsed.
Actual behaviour
File is not parsed due to wrongly encoded bytestring field.