Open cameel opened 3 weeks ago
As pointed out by @ekpyron, we do have some kind of deduplication at assembly level already: https://github.com/ethereum/solidity/blob/5da0f47439340097fe5b8a409caadc4fd3f00752/libevmasm/Assembly.cpp#L1053-L1071
Before fixing this we should check why it doesn't kick in here. Does the generated bytecode end up being different?
It may be that subRef
in the existing duplication only extends to one nesting level, so it may be that we don't deduplicate a subassembly with an indentical subassembly of a subassembly that way. In cases like that, we also need to be careful about this: the outer assembly can reuse the inner one (the subassembly of a subassembly), but we can't change the nested assembly (i.e. the subassembly who expects it as its own subassembly cannot instead refer to the outside assembly) - both because it won't have access to it and because of determinism.
Abstract
When a contract deploys another contract (via
new
) or accesses its bytecode (via.runtimeObject
or.creationCode
), the compiler embeds that bytecode in the accessing contract. Depending on whether the contract is accessed at creation time or at runtime, its bytecode ends up as a subassembly of the creation or runtime assembly, respectively. However, when it is accessed at both times it ends up being included in both places.This happens in both the legacy and the IR pipeline.
Details
This behavior can be clearly seen in the IR codegen. It's clear that there's no attempt at deduplication: https://github.com/ethereum/solidity/blob/8a97fa7a1db1ec509221ead6fea6802c684ee887/libsolidity/codegen/ir/IRGenerator.cpp#L184 https://github.com/ethereum/solidity/blob/8a97fa7a1db1ec509221ead6fea6802c684ee887/libsolidity/codegen/ir/IRGenerator.cpp#L206
Motivation
This duplication not only increases bytecode size but also requires the compiler to optimize the same code twice, which is especially problematic for the IR pipeline.
How to reproduce
Bytecode of
C
clearly is included inside bytecode ofD
twice. The long string included in the source can be seen twice as two long sequences of0x44
.The effect is similar with
--via-ir
, though the string is not as easily visible. Still, it can be easily confirmed by removingD
's constructor - it cuts the side ofD
' bytecode almost in half.Possible solutions
Backwards Compatibility
I can't see any backwards compatibility issues here.