Closed alcuadrado closed 5 days ago
Hi Patricio, thanks for this, I'm not completely through with reading but just to let you know that this won't get lost, generally super valuable to have such broadly-ontaking and deep reaching strategic analysis.
I'm opening this issue to summarise what IMO are the problems with the current design of some libraries and to propose an alternative.
Problems
From my perspective, there are three interconnected problems in the libraries that represent protocol objects (i.e. Account, Block and Transaction):
They are based on
ehereumjs-util
'sdefineProperties
. This function is based on super dynamic aspects of javascript, making the libs hard to adapt to typscript, hard to understand, and hard to maintain. This has been discussed during multiple typescript migrations, most notably duringethereumjs-account
's one.They are designed around the RLP representation of each object. While this isn't necessarily a bad thing, these libraries are also used by smart contracts and dapps developers, that don't have enough understanding of the protocol and have trouble understanding some things. A clear example of this is that
ethereumjs-tx
periodically receives false bug reports about leading zeros being stripped. This is the expected behavior when working with the RLP representation of some of the transactions fields, but totally unexpected for someone using it for another purpose.All of them are mutable. This is an issue, as there are complex requirements/dynamics between them, and being able to mutate each object individually opens the door to getting objects out of sync. An example of this is parsing a block, which will construct a
Block
object and a series ofTransaction
objects, and then changing the block number to 0. This would/should render any EIP155-signed transaction invalid, but it's not clear how this can be implemented with the current design. What will probably happen now is that theBlock
and theTransaction
s will have incompatibleCommon
instances.Proposal
I think (1) can be solved without breaking changes by inlining part of
defineProperties
logic in each library. I implemented a prototype of a similar idea onethereumjs-tx
in this PR. While I consider this an improvement, it will mostly increase the maintainability of the libraries. This change wouldn't make any difference with (2) and (3), which are the points that can lead to confusion and bugs in the users' code.In my opinion, to effectively fix (3) without an intricate design that may not be enough in the future, we should prohibit mutating the domain objects. This means making
Account
,Transaction
andBlock
immutable. And the same should be done forCommon
. If they are immutable, everything can be validated on construction, and objects can't get out of sync.Fixing (2) needs a deeper redesign of the libraries. RLP should be treated just as a serialization strategy, and everything should be modeled with their actual data types instead of everything being a
Buffer
.I think a good approach to design them would be to aknowledge that there are multiple representations of these objects, implement everything as close to the actual domain as possible, and make it easy to go from one representation to the others. At the moment, I think these should be:
Domain objects: All the fields have their actual data type. If something is a number, for example, it should be represented as a
bigint
orBN.js
. These should be normal classes, everything dealing with how to get from other representation to this one should be placed in a factory method.Json objects: this is pretty self-explanatory, with a domain object you should be able to construct a json object, where everything is represented as string.
RLP representation: A factory method should be able to construct a domain object from its RLP serialization, and domain objects should be RLP serializable.
Plain javascript objects: The most common way to represent things like transactions in ethereum today are plain js objects, most of the times even object literals. To adapt to this reality, there should be factory methods that turn these representations into a domain object.
Example implementation
I sketched how
Transaction
would look if reimplemented with this designUnknowns
There are two things that are hard to answer about this proposal:
How do we go from the current design to this one incrementally? It'd be a major implementation and coordination effort.
How would implementing everything with types more abstract than
Buffer
impact on the project's performance? I have a strong belief that it wouldn't be much of an impact, as we are copying buffers and transforming them from/to strings super often now.Previous discussions
Finally, here's a list of interesting links where these things have been discussed:
Previous discussions about the limitations of the design:
https://github.com/ethereumjs/ethereumjs-block/pull/69#issuecomment-515452213 https://github.com/ethereumjs/ethereumjs-account/issues/29 https://github.com/ethereumjs/ethereumjs-tx/issues/151 https://github.com/ethereumjs/ethereumjs-tx/pull/154
Common "bugs" that users report because of the libraries using RLP-based representation.
https://github.com/ethereumjs/ethereumjs-tx/issues/74#issuecomment-318066638 https://github.com/ethereumjs/ethereumjs-tx/issues/112 https://github.com/ethereumjs/ethereumjs-tx/issues/140 https://github.com/ethereumjs/ethereumjs-tx/issues/164 https://github.com/ethereumjs/ethereumjs-util/issues/141