ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.27k stars 5.76k forks source link

solc wastes memory when using structs #13475

Closed nventuro closed 2 years ago

nventuro commented 2 years ago

There seems to be a lot of memory overallocation when dealing with memory structs, depending on the exact construct used to work with them. Apparently the situation improved in v0.7.6, which lists 'avoid memory allocation for default value if it is not used' on its changelog.

Prior to that, the following snippet:

struct MyNiceStruct {
    uint256 a;
    uint256 b;
}

function alloc() private pure returns (MyNiceStruct memory) {
    MyNiceStruct memory myStruct = MyNiceStruct({ a: 3, b: 4 });
    return myStruct;
}

results in 192 bytes being allocated, corresponding to 3 (!!!) instances of MyNiceStruct. These allocations are apparently triggered by: a) the declaration of a variable in the function b) the assignment using the special struct assignment syntax c) the fact that there's a struct return value

If the result of alloc() were to be assigned to a struct at the callsite, that'd result in yet another allocation.


In current versions, all the way up to 0.8.16, the situation is still bad, though not quite as bad. The assignment to the result of alloc() seems to no longer cause an allocation, but I still run into double allocations if I either use the MyNiceStruct({ ... }) initialization syntax, or declare a local struct instead of using a named return value.

The following seems to be the only way to make the compiler allocate space for just one instance of the struct:

function alloc() private pure returns (MyNiceStruct memory myStruct) {
    myStruct.a = 3;
    myStruct.b = 4;
}
hrkrshnn commented 2 years ago

This is a known issue :( Improving memory management is one of the roadmap issues and is currently being worked on: https://github.com/orgs/ethereum/projects/20 Closing this for now; see #13320 and other related issues.

nventuro commented 2 years ago

Thanks for the reply @hrkrshnn. I didn't see any mention of structs there however, are there any concrete plans for those?

To be honest, structs feel like a neglected feature: they're the only memory type that exhibits this weird auto-allocation behavior, they have strange copy semantics (close to value types, but they're sort of a reference type?), and are just all around wasteful to use if trying to use the 'standard' language constructs (return statements, initialization with named arguments, etc.). The documentation doesn't even show memory structs at all: it looks as if they're just a hack to get packed storage variables.

All of this makes me avoid structs whenever possible, since I find I cannot predict wheter my usage will result in wasted allocation. Is there some plan to address these issues?

hrkrshnn commented 2 years ago

@nventuro Yeah, structs would be part of that. Currently, Solidity barely has the idea of reference semantics: most assignments are indeed deep copies. There is some exploratory work on improving this--you can likely expect a talk about it in Devcon Bogota if you are interested.

Also good point about improving documentation around structs. I'll make an issue about it.

it looks as if they're just a hack to get packed storage variables.

Storage gets packed even without structs, but I get your point :)

nventuro commented 2 years ago

How much of a priority would you say this issue has in the roadmap? Are there any expected timelines by which we could expect to see improved memory management?

This is one of the issues that makes me not want to switch to the new codegen pipeline (which automatically moves stack variables to memory), as it doesn't quite feel like solc can be trusted with automatic memory allocations.

leonardoalt commented 2 years ago

Not that it contributes much, but better performance in dynamic struct memory allocation has very little to do with allocation of stack variables in memory, although of course they would need to work together when the dynamic parts are optimized.

nventuro commented 2 years ago

What do you mean by 'dynamic' struct allocation?

My point was rather that basic usage of structs can easily result in excessive overallocation (4x in a trivial case prior to 0.7.6, 2x in the latest versions), and I worry that similar things might happen when letting the compiler automatically promote variables to memory.

leonardoalt commented 2 years ago

Yep, I agree that those cases are pretty bad rn, but moving variables to memory is much simpler and likely moving single words doesn't have a bad worst case.

ekpyron commented 2 years ago

The compiler only moves variables to memory that can be assigned a globally fixed memory location at compile time - that's why the mechanism doesn't work in recursive functions so far. The allocation mechanism for this is completely decoupled from the free-memory-pointer based mechanism, i.e. the memory offsets are statically assigned by the compiler up front, and the free-memory-pointer is only ever initialized with the memory offset past the memory reserved for the moved variables. So there is little danger of excessive allocation due to variables moved to memory - by design it's one slot per variable - for all cases in which that wouldn't be enough, we simply don't do it.

nventuro commented 2 years ago

and the free-memory-pointer is only ever initialized with the memory offset past the memory reserved for the moved variables.

Does this mean that if my code feature 10 mutually exclusive code paths, each of which results in one value being promoted to memory, the contract will allocate all 10 words in all cases?

ekpyron commented 2 years ago

and the free-memory-pointer is only ever initialized with the memory offset past the memory reserved for the moved variables.

Does this mean that if my code feature 10 mutually exclusive code paths, each of which results in one value being promoted to memory, the contract will allocate all 10 words in all cases?

Depends. The slots are allocated per Yul function post optimizer. Disjoint paths through the call graph of these functions can be assigned shared slots. So if you're lucky and the 10 mutually exclusive code paths end up in separate yul functions that live in disjoint paths in the call graph, only one word will be allocated.

Whether slots can be shared, could be determined on a more fine-grained level than per Yul function, but (as the main topic of this issue correctly touches) the compiler is much worse at wasting memory during free-memory pointer based allocation, so our priority is to first fix that and only then to fine-tune assigning memory slots for variables further (if it turns out that's worthwhile).