Discussion: Stack Limits & Spills

pruby commented 7 years ago

On any of my experiments which go beyond very basic wallets, coins, etc, I tend to run very quickly in to the "stack too deep" error. While I understand the reasons for this happening (the hard cap of 16 on SWAP, DUP calls), this seems like a low-level detail which conventional languages deal with automatically and transparently. It would be nice if Solidity could move towards this for more complicated cases.

I've been considering and researching some possible solutions, and I'd like to discuss what people have already tried and possible migration paths towards that. This is the sort of thing I'd be interested in coding if we can come up with a simple enough way of doing it.

I'm going to dump some of my thoughts on how this could work here, but please tell me if things have already been tried, don't work, or if you have further thoughts/analysis before I try to implement anything.

As far as I can tell from reading, stack machine languages normally store local variables in memory rather than on the stack. This raises one possible solution - allocate a stack frame for functions and simply change the compiler convention so that variable assignments result in an MSTORE and use in an MLOAD. This would have a greater impact on code which is currently small enough to do without, and would still leave the possibility of getting this error if we nest function calls deeply enough without storing in a variable.

Another possible solution would be to dynamically detect overflow; and spill entries from the stack (could be produced by any types of operation) only when necessary. This would, however, be very complicated to implement.

We couldn't do this in the optimiser after converting to concrete EVM opcodes as EVM has no way of expressing DUP17 or higher. We'd need to add methods to traverse AST nodes which allowed us to precalculate maximum stack depth, find appropriate spills, record that these nodes are memory-stored, add the store procedure at the end of the node's assembly output, and make DUP opcodes referencing them use a load procedure instead.

Allocating stack frames on EVM would be decidedly non-trivial, as we're quite heavily penalised for leaving a hole in memory. This forces our stack to have a small, fixed length, or share the same compact space as the heap. A stack could be allocated in a segmented manner using the heap allocator, though we would need to store at least one word of metadata on each segment (previous segment, next segment, within-segment usage pointer).

We would need two globals at fixed addresses to store a frame pointer and a segment pointer for a segmented stack. Both of these would need to be saved and restored on calling in to any function which used the stack.

A function preamble would need to check whether there is space in the current segment for our stack frame, and move on to or allocate the next frame if there wasn't. To keep code within the function simple, I would suggest that the function's stack frame should always fit within one segment.

pruby commented 7 years ago

Just sketching out this idea and getting an idea of the costs associated with a segmented stack - preliminary look indicates they may be quite high. None of this is tested or even checked to see whether the stack balances, just trying to work out a cost estimate.

Stack storage / retrieval code could look like:

PUSH (frame pointer)
MLOAD
PUSH (offset in frame)
SUB
MSTORE / MLOAD

All of these are W_{verylow} cost instructions, so cost would be 7 bytes of code & 15 gas per load/store.

The function preamble could look something like:

PUSH (segment pointer)
MLOAD
PUSH (mask for lower bits)
AND (yields the current usage)
PUSH (required usage)
ADD (yields the required frame capacity)
PUSH (segment pointer)
MLOAD
PUSH (offset for current segment capacity)
DIV
PUSH (mask for lower bits)
AND (yields the current frame's capacity).
LT (yields 1 if capacity is less than required)

// Running cost: 41 gas

// We now have one stack entry with whether or not we have space in the current frame

PUSH (segment pointer)
MLOAD
DUP1 (segment pointer copied)
PUSH (offset for next segment pointer)
DIV
PUSH (mask for lower bits)
AND (yields next segment pointer).
SUB (difference between pointers)
MUL (difference between pointers if capacity less than required, zero otherwise)
ADD (combine with existing segment pointer - yields current segment pointer if enough space, next one otherwise).

// Running cost: 75 gas

// We now have one stack entry containing the stack frame we should use, zero if empty

// We will have to allocate another frame if there is none. This is too complicated to do in a
// non-branching fashion, so branch if our next segment address contains zero. No idea
// what the calling convention is - might need to adjust that.

PC
DUP2
ISZERO
PUSH (address of allocation routine)
JUMPI
POP // Discard PC for balance if we didn't jump

DUP1 (copy calculated segment pointer)
PUSH (mask for lower bits)
AND (yields the current usage)
PUSH (required usage)
ADD (yields the new top-of-frame).

// Running cost (no branch taken): 113 gas

// We have two entries in the stack here - the new frame pointer on top, and the new segment underneath

// Push the last function's global values to the stack

PUSH (frame pointer)
MLOAD
DUP2 (new top-of-frame)
MSTORE

PUSH (segment pointer)
MLOAD
DUP2 (new top of frame)
PUSH1 0x20
SUB
MSTORE

PUSH (frame pointer)
MSTORE

PUSH (segment pointer)
MSTORE

// Running cost: 155 gas

This is looking like quite a lot of code, and doesn't even implement the case where a frame has to be allocated.

Gas when we don't need to allocate a new segment looks to be around 155. Exactly how much code this takes depends on how cheap we can make the masks and offsets. Not sure how you do that now - assume some might actually be loaded from code memory to reduce instruction size?

chriseth commented 7 years ago

We can address that with the new backend. Also, with the introduction of structs to the ABI, this problem also becomes less relevant.

cameel commented 2 years ago

@ekpyron How relevant is this now that we have the memory escalator? Should we close this?

Also, if it's still relevant, perhaps we should close this anyway and move the discussion to the forum? Now that we have one it seems to be a better place for such open-ended discussions.

ekpyron commented 2 years ago

Oh wow, I wasn't aware of this issue :-). But yes, this is largely implemented in the via-IR pipeline with stack-to-memory (the main corner cases being non-memory-safe inline assembly and recursive functions, but we had few reports about that, so it doesn't seem like a huge issue in practice at least yet). FYI: the issue of allocating stack frames is avoided in the current implementation by allocating globally fixed memory locations for variables, which is safe in all cases other than recursive functions (which is easy to tell on the Yul level). And following https://github.com/ethereum/solidity/issues/2693#issuecomment-320214409 it was indeed much easier to implement this for the via-IR pipeline.

ethereum / solidity

Discussion: Stack Limits & Spills #2693