ethereum / eth-abi

Ethereum ABI utilities for python
MIT License
241 stars 268 forks source link

decode_abi with a string returns a bytes value #81

Closed carver closed 4 years ago

carver commented 6 years ago

What was wrong?

I would expect that decoding an abi value of string would return a str-typed value.

In [1]: from eth_abi import decode_abi

In [2]: decode_abi(['string'], b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05mystr\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
Out[2]: (b'mystr',)

I expected ('mystr',)

How can it be fixed?

Last step of decoding should be to call .decode('utf8') on a bytes value if using a string decoder.

pipermerriam commented 6 years ago

Noting that this will result in string values that cannot be decoded due to containing invalid UTF8 characters, but I think this is ok since the docs for the string type state that it is assumed to be UTF8 encoded.

carver commented 6 years ago

assumed to be UTF8 encoded

Right, although it might be desirable for users to retrieve the bytes data even if it's not valid utf8. (rather than just crash)

Maybe would should use something like .decode("utf-8", "replace"). Although I kind of hate it already, the idea that broken strings won't be noticed early on. I'm just not sure how else to reasonably give people access to malformed data. Maybe we just don't. If you want malformed data, you write a curl call to the json-rpc.

I guess I landed on: crash if not valid utf8.

pipermerriam commented 6 years ago

How about we raise a custom error message that is as informative as possible when this happens.

The returned type for this function is string which is expected to be a UTF8 encoded string of text. The returned value ... could not be decoded as valid UTF8. This is indicative of a broken application which is using incorrect return types for binary data.

dylanjw commented 6 years ago

Im going to work on this because it will simplify the log data filter for web3: https://github.com/ethereum/web3.py/pull/953

Just to recap, to see that Im understanding.

  1. Do return a string type, UTF-8 decoded for abi strings.
  2. If .decode("utf-8") fails raise an exception.
pipermerriam commented 6 years ago

Correct

carver commented 6 years ago

:+1: I think this will require a major version bump.

davesque commented 4 years ago

I suppose this is fixed.