Modelling of trailing zeros in ContextFramesBytesIO

JoranHonig commented 4 years ago

Version: 2.0.0

What was wrong?

The eth-abi decoder models the calldata of a transaction as a finite length datastream. However, the evm models calldata as an infinite length bytestream. Returning zero bytes when the transaction does not specify the bytes requested.

The tests in this repository do seem to indicate that the decoder is intentionally modelled this way. I'm not sure why this is the case. It'd be cool if there is an additional option to model the bytestream as infinite.

How can it be fixed?

I have a local branch where I adapted the ContextFramesBytesIO object to mimic the behaviour of calldata in the evm. I'm happy to submit a pr if this is something that could be added to the repository.

JoranHonig commented 4 years ago

For reference: https://github.com/JoranHonig/eth-abi/blob/master/eth_abi/decoding.py#L111 This link points to function that alters the behaviour. I'm not very familiar with the codebase, so there might be some edgecases that I haven't thought of 😅

davesque commented 4 years ago

Hey @JoranHonig , thanks for getting in touch.

Can you give an example of where this particular issue is causing problems for you? I know there's a conceptual difference between the way the EVM specifies the behavior of call data (as a virtual infinite byte string) and how eth-abi views the data its sending over the wire (as finite byte strings). But I'm having a hard time imagining how this would cause problems.

JoranHonig commented 4 years ago

Hi @davesque, thanks for your response!

I'm trying to decode automatically generated testcases (in the shape of a series of transactions). The automatic generation of these testcases is done by Mythril (on the bytecode). During it's analysis, Mythril will try to generate transaction traces for the different issues it uncovers. However, because we analyze the bytecode we have this issue that calldata is infinite. Because we don't know the abi we don't really know how much trailing zero's are needed for the decoder. This is why we chose to remove any trailing zero's if possible (/ keep the calldatasize as low as possible). In the end, mythril will give transaction traces that would work on the evm, but are not decodable using eth-abi.

Long story, hope it makes sense 😅

I'm also wondering if a similar issue could occur outside of mythril. Wouldn't wallets also benefit from removing trailing zeros? That should reduce gas cost, while not having any negative effects on the execution (since it does not matter from the evm's perspective). These transactions would also not be decodable right?

davesque commented 4 years ago

I think you can probably get away with just appending some number of zeroes based on the call data size. Basically, if you have some string of bytes that you know should represent some ABI-encoded value, you can probably just pad the string at the end with zero-bytes such that the number of bytes in the string is a multiple of 32 and it will probably decode. Although I'm not really sure you should do that! It's conceivable that making a blanket modification to ABI-encodings like that could mask some bugs. In other words, it would be better if you could zero-pad the end only if you know that it makes sense to do so (if there's some value encoded on the end which is normally zero-padded in the ABI fortmat).

I'm not sure we should modify eth-abi to accommodate this situation because this library is designed intentionally to complain when encodings are malformed. That's a feature that's intended to help users pinpoint problems with their app code.

On the matter of representing trailing zeroes implicitly by truncating trailing zeros and taking advantage of the EVM spec, I suppose you could try something like that, but it seems to violate what I'd say is a reasonable expectation that the input data length corresponds to the typical length of the expected input type.

JoranHonig commented 4 years ago

Hi @davesque,

Although I'm not really sure you should do that!

😄 Right, there are probably some uncovered edgecases with this approach

I'm not sure we should modify eth-abi to accommodate this situation because this library is designed intentionally to complain when encodings are malformed. That's a feature that's intended to help users pinpoint problems with their app code.

I understand your argument, and I see how that is the main goal of this repository. Though, would it be possible to add the option for developers to use this EVM like modelling of calldata? Giving the developer the option to do strict checking of ABI conformance, but also allow a more fuzzy approach that accommodates other scenarios.

As you can see in https://github.com/JoranHonig/eth-abi/blob/master/eth_abi/decoding.py#L111 the required change wouldn't be large. Which means it hopefully won't be too difficult to maintain.

davesque commented 4 years ago

I appreciate your looking into this and coming up with a solution. However, it still seems to me like that particular change applies only to this specific problem. Since this is an open source project, we have to consider the long-term cost of making changes and decide if the maintenance and added complexity are worth it.

Is it possible for you to import that class into your project and subclass it with the desired modifications? Maybe it would make more sense for us to change eth-abi so that it's easier for you to use your own custom classes or something along those lines?

JoranHonig commented 4 years ago

Since this is an open source project, we have to consider the long-term cost of making changes and decide if the maintenance and added complexity are worth it.

That makes sense 😅

Is it possible for you to import that class into your project and subclass it with the desired modifications?

Yes 👍 : I've currently gone with an approach where I have a custom subclass for ABICodec that uses another custom subclass ContextFramesBytesIO.

Maybe it would make more sense for us to change eth-abi so that it's easier for you to use your own custom classes or something along those lines?

That would be awesome! My current approach seems to work, but I had to copy paste some code from ABICodec into my custom subclass. It would be awesome if I could dependency inject my Custom ContextFramesBytesIO into the base eth-abi ABICodec. Would that be a feature that you'd add to the library?

davesque commented 4 years ago

It would be awesome if I could dependency inject my Custom ContextFramesBytesIO into the base eth-abi ABICodec. Would that be a feature that you'd add to the library?

Yeah, that sounds like a great approach to fixing this. Here's one way we could do it: #136 . Let me know what you think. It would allow you to avoid copying and pasting code from ABIDecoder and do what you want entirely through the use of subclasses.

JoranHonig commented 4 years ago

@davesque #136 would certainly suffice for this usecase 👍 Thanks!

ethereum / eth-abi

Modelling of trailing zeros in ContextFramesBytesIO #135

What was wrong?

How can it be fixed?